Friday, February 6, 2009

Average Case Hardness

In 1971, Stephen Cook showed that the boolean satisfiability problem is NP-Complete under polynomial time reductions. Soon afterward, Karp showed that many natural combinatorial problems (Vertex Cover, Clique, Hamiltonian Circuit etc..) were also NP-Complete. However, these results are often misinterpreted. Because most computer scientists believe $P \neq NP$ it is easy to assume that it is always hard to solve $3-SAT$. This is not true. In fact most randomly generated $3-SAT$ instances can be solved efficiently by algorithms like DPLL. Assuming that $P \neq NP$ then all we may conclude is that there is no efficient algorithm to solve these problems in the worst case. It is conceivable that $NP-Complete$ problems could still admit polynomial time algorithms for average instances.

Notice that problems which are not hard on average are poor candidates for cryptography and one-way functions. Protocols such as Diffie-Hellman would absolutely fail if it was easy to compute the discrete logarithm (given a multiplicative generator $g$ in a finite field mod p, and an integer $x$ compute $a$ such that $g^a \equiv x$ mod p) for average instances. Russel Impagliazzo (A Personal View of Average-Case Complexity) describes several possible worlds and their applications to cryptography.

World 1: Algorithmica $P = NP$ or $NP \subseteq BPP$. We all know that such a result would revolutionize computer science and machine learning. However, cryptography would be impossible in this world.

World 2: Heuristica $P \neq NP$, but, over any efficiently computable distribution, $NP-Complete$ problems are tractable on average. Intuitively, there are hard instances of $NP$ problems, but the problem of finding hard instances is itself intractable. Once again, crytography would most likely not be feasible in Heuristica because eavesdroppers could solve problems in a time comparable to the time it took to generate the problem.

World 3: Pessiland $P\neq NP$ AND there are problems which are hard in the average-case for efficiently computable distribution. However, there are no one way functions. This means that given $f(x)$ it is usually possible to find some $x'$ such that $f(x') = f(x)$ in time comparable to the ammount of time needed to compute $f(x)$. In Pessiland, it would be easy to generate hard instances of $NP$ problems, but there would be no way of generating hard (presolved) instances of $NP$ problems. Public key crytography would still not be possible. Thus, Average Case Hardness is a necessary (but not sufficient) condition for cryptography.

Worlds 4 and 5: Minicrypt and Cryptomania. In Cryptomania, almost any cryptographic task is possible. This is the world we typically assume we live in, because we assume protocols like RSA and Diffie-Hellman are secure. However, the only evidence that the Discrete Logarithm problem is hard is that fact that we don't know of an efficient algorithm to compute it. In reality, there is no theoretical reason why the problem should be hard. In fact, Peter Shor recently showed that Discrete Logarithms can be solved efficiently on a Quantum Computer.

Fortunately for cryptographers, the Discrete Logarithm problem is a beautiful example of self reducibility.

Lemma: Suppose that we had an algorithm $A$ which computes the the discrete logarithm efficiently (in time $poly(n)$, $n = \log p$) for at least $\frac{p}{poly(n)}$ values of $x$, then there is an efficient ZPP algorithm $A'$ to compute the discrete logarithm efficiently.

Proof (Sketch): We can pick $k$ such that $A$ solves at least $\frac{p}{n^k}$ instances in time $O(n^k)$, let $m = n^{2k}$.
Input: Generator g, Integer x, Prime p
Output: $a$ such that $g^a \equiv x$ mod p or FAIL

For $i = 1...m$
---$a_i \leftarrow Unif(0,p-2)$
--- $x_i = g^{a_i}$ mod p
---// Number Theory Fact: this is equivalent to picking $x_i\leftarrow Unif(1,...,p-2)$

---If $A(g,x\times x_i,p)$ returns $a'$ in $n^k$ steps then
------// Now we know the answer due to number theory:
------// $x \times x_i \equiv g^{a_i} g^{a}$ mod p
------// $ g^{a_i} g^{a} = g^{a_i + a} \equiv g^{a'} $ mod p
------// $a = a' - a_i$
------return $a = a' - a_i$
Output Failed

It is not too hard to verify that the algorithm takes polynomial time and with high probability computes the correct answer. QED

I am unaware of any NP-Complete problems which are self reducible in a similar sense as the Discrete Logarithm problem. If such a self reduction was found I suspect that this would be a very interesting result in complexity theory.

A more common approach to the problem of defining average case hardness is that of Leonid Levin. Levin (Average Case Complete Problems) defines the notion of a random NP problem and a complete random NP problem. His definitions are tedious and hard to follow at times, but they are worth the time and energy to understand.

Definition: A random problem is a pair $(\mu, R)$ where $R \subset \mathbb{N}^2$ is an instance witness and $\mu: \mathbb{N} \rightarrow [0,1]$ is a probability distribution function.

A problem instance is an integer $x$ is the problem instance. $x \in L$ if and only if there is $y$ such that $(x,y) \in R$. The a probability distribution function $\mu(x)$ gives the probability of all problem instances which do not exceed $x$.

Notation: $\mu'(x) = \mu(x)-\mu(x-1)$ is the probability density function.

Also, by convention
  • $|x| = \lceil \log x \rceil$
  • $R(x,y)$ is true if and only if $(x,y) \in R$.

Definition: A random problem is in NP if both $R$ and $\mu$ are both computable in polynomial time. We call such a problem a random NP Problem.

In other words, given a tuple $(x,y)$ (a problem instance and a verifier) we can decide whether or not $(x,y) \in R$ in polynomial time in $|x|$. We can also sample from the distribution $\mu$ in polynomial time.

Definition: A random problem $(\mu, R)$ is polynomial on average if there is a Turing Machine $M$, which runs in time $t(x)^k$ such that:
  1. $M(x) \leftrightarrow \exists y R(x,y)$
  2. $\Sigma_{x=1}^\infty \mu'(x) \frac{t(x)}{|x|}$ converges.

The first condition guarantees that $M$ actually decides the problem. The second condition guarantees that the Turing Machine must run quickly on average instances sampled from our distribution $\mu$.

Definition: We say that the probability distribution function $\mu_1$ dominates $\mu$ if $\exists k \forall x \frac{\mu'(x)}{\mu_1(x)} \leq |x|^k$. In such a case we write $\mu \prec \mu_1$.

Intuitively, this definition guarantees that all of the 'likely' instances of $\mu$ are also the 'likely' inputs of $\mu_1$

We are finally ready to define the notion of reduction between random NP problems.

Definition:A polynomial time computable function $f$ reduces a random NP problem $(\mu_1, R_1)$ to another random NP problem $(f(\mu_2), R_2)$ if the following conditions hold:
  1. $f(\mu_2)(x) = \Sigma_{f(y) \leq x} \mu'(y)$
  2. $\mu_1 \prec \mu_2$
  3. $\exists y_1 R(x,y_1) \leftrightarrow \exists y_2 R(f(x),y_2)$
Condition 1 and 2 say that likely instances of problem 1 are mapped to likely instances of problem 2. Condition 3 guarantees that yes instances of problem 1 are mapped to yes instances problem (also no instances are mapped to no instances).

Levin shows that these reductions are closed under composition. If $A(x)$ is an algorithm that is fast on average for $(f(\mu_2), R_2)$ then $A(f(x))$ runs at most polynomially slower for $(\mu_1, R_1)$.

Definition: A random NP problem is complete if every random NP problem is reducible to it.

Levin proved that Tiling is an NP-complete random problem. Other natural combinatorial problems such as Graph Coloring and Matrix Decomposition have also been shown to be NP-complete random problems.

1 comment:

  1. A few comments:

    . It's true that "random 3CNF formula" are easy to solve by DPLL-style algorithms only for certain "clause densities". In fact, picking random 3CNFs with clause density around 4.2 is actually a great way to generate difficult-seeming SAT formulas. This is a fascinating area; here is just one possible quick survey, by CMU's own Abie Flaxman:

    . It is a very famous 1993 result of Feigenbaum and Fortnow that *no* NP-complete problem has a non-adaptive randomized self-reduction (like Discrete Log has) unless the Polynomial Hierarchy collapse. Getting rid of the "non-adaptive" constraint here has been a challenging problem with some positive results; see, e.g., this paper of Bogdanov and Trevisan:

    . As Jeremiah says, Levin's theory is a bit hard to follow. After his initial work, Impagliazzo and Levin published an *equivalent* theory which is easier to understand:


Note: Only a member of this blog may post a comment.