15-855: Intensive Intro to Computational Complexity

Sunday, May 3, 2009

Wrapup

Thanks again to all the hardy souls who hung in there for the semester!

Please remember to fill in the course evaluation at my.cmu.edu, under Academics.

Final grades are more or less tabulated; please email us if you want to learn them early.

best,
Ryan

Friday, May 1, 2009

Parameterized Complexity and ETH

We look into Parameterized Complexity theory (introduced by Downey and Fellows via many papers) and its connections to classical complexity theory specifically via the Exponential Time Hypothesis (of Impaggliazzo and Paturi 2001) in this post. Parameterized complexity provides a framework for a more refined analysis of hard problems. Heuristics, parallel algorithms, approximation schemes, randomized algorithms are some of the approaches used to counter such problems. But these approaches suffer from many defects ranging from no hard bounds on the quality of the solution (heuristics), to not being applicable to really large instances (parallel algorithms) to impractical solutions (like some PTAS's etc.).

Introduction:

Classical complexity classifies problems using the concept of some resource (time or space). This leads to a good theory but it also ignores any structural information in the input which makes problems appear harder then they really are. There is a wide variation in the worst-case complexities of known exact algorithms for the NP-complete problems. For e.g., there are pretty good 3SAT solvers present right now which scale to a large number of variables.

Parameterized complexity tries to explain these differences by assuming that there is some part of the problem (the `parameter') will be small and allow us to develop efficient poly-time algorithms. The classic example for this is to consider a database query - it has two parts the database and the query of which query is usually much smaller than the database. The query size $k$ would be the natural paramter for a parameterized complexity analysis by admitting algorithms whose non-polynomial behavior is restricted by the parameter: if $k$ is small and $n$ is large, $O(2^k . n)$ is better than $O(n^k)$.

The main contribution of the theory is establishing the intractability of certain problems by classifying problems into complexity classes by reductions which are inherently 2-dimensional depending on the problem size as well the parameter. A problem can have different parameterizations also, each leading to different results.

Complexity Classes - FPT and the W[t] hierarchy:

A parameterized language $L$ is a subset of $\Sigma^* x \Sigma^*$ where $\Sigma$ is a finite alphabet. Let $(x, k) \in L$, then we call $x$ the main part and $k$ the parameter.

a) Tractability - Fixed-parameter tractable (FPT): $L \in FPT$ if it can be decided in time at most $f(k) . |x|^{O(1)}$ for an arbitary function $f$. For a fixed $k$ it is in P, moreover for every $k$ it is in the same polynomial class via the same machine. As an example, consider the $p-SAT$ problem where given the formula $\phi$ and the parameter $k = $ number of variables in $\phi$ decide whether $\phi$ is satisfiable. This is clearly in FPT as the obvious brute force approach for formula of size $n$ with $k$ variables will take time $O(2^k . n)$. There are many real world problems which are in FPT like parameterized Vertex Cover which uses the kernelization technique to get a $O(k . n + 1.286^k)$ algorithm (due to Chen, Kanj and Jia, 2001). Note that the classical Vertex Cover is NP complete - yet this result shows that it
is not as `hard' as say Independent Set.

b) Parametric Intractability: Analogously to classical complexity theory, Downey and Fellows developed a completeness program for intractable parameterized problems. In order to compare the hardness of parameterized problems, we need a 2-dimensional reduction (called `parameterized-reduction'). Roughly, a language $L$ is parameterically reducible to $L'$ if there is an FPT algorithm that transforms $(x, k)$ to $(x', k')$ so that $(x, k) \in L$ iff $(x', k') \in L'$ and $k' = g(k)$ where $g$ is an unrestricted function. Note that Karp reductions are rarely parameterized reductions (e.g. Clique to Independent Set is one of the exceptions).

1) W[1]: The lowest class of parameterized intractability can be defined as the set of languages that are reducible to the Short-Turing Machine Acceptance problem (also called the $k$-step halting problem) where we want to determine for an input consisting of a nondeterministic Turing machine M and a string x, whether M has a computation path accepting x in at most k steps. In some sense, this is the parameterized analogue to the Turing machine acceptance problem: the basic generic NP-complete problem in classical complexity theory. Canonical problems include Independent Set (does $G$ have an independent of size $k$), Weighted 3SAT (does $\phi$ have a satisfying assignment of weight $k$) etc. We will also give an alternative definition of W[1] afterwards which is actually used for the basic results.

2) The W[t] hierarchy: Interestingly, while general $k-$SAT and 3SAT are equivalent for NP-hardness in classical complexity, there is no known parameterized reduction computing general satisfiability from 3-CNF formulae. This leads to a realization that the logical depth of a formula affects its parameterized complexity. Intuitively it is related to the number of alternations between unbounded fan-in AND and OR gates. Hence, we can base the hierarchy on the complexity of circuits required to check a solution.

The W[t] is based on Circuit-SAT problem (parameterized by the weight of input like for Weighted 3SAT) for family of circuits $F_{h, t}$ having:
1. Small gates: with constant fan-in
2. Large gates: with unbounded fan-in

and of depth $h$ (max. number of gates in the path) and weft (max. number of large gates in the path) at most $t$. Clearly, we have the containments:

$FPT \subseteq W[1] \subseteq W[2] \ldots $

It is not known whether the containments are strict or not. Note that W[1] can now be defined as the class that can be reduced to an parameterized Circuit SAT on the family of constant depth weft 1 circuits. Downey and Fellows (1995) proved that parameterized Independent Set is W[1]-complete. The hardness proof is very intricate and we omit it here. Note that using this definition it is easy to see why Independent Set is in W[1] (sketch):
1. Each vertex in $G$ corresponds to one input gate in the circuit
2. For every edge $(v, u) \in G$ build a small OR gate: $(1-v) \wedge (1-u))$
3. Output from small gates are given as input to a single large AND gate.

Interestingly unlike NP-hardness results, W[1]-hardness for the $k-$Halting Problem uses Independent Set completeness as intermediate results. There have been many papers demonstrating naturally occuring W[t]-complete problems like Dominating Set for W[2] etc.)- for an extensive list of already classified problems see the Downey and Fellows monograph.

Exponential Time Hypothesis (ETH)

ETH was first studied by Impagliazzo, Paturi and Zane 2001 and it states that 3SAT $\notin$ DTIME$(2^{o(n)})$, where $n$ is the number of variables. They also proved that the hypothesis is robust to analogous hypotheses for other NP-complete problems like Independent Set etc. In fact, they also showed that the ETH is independent of size measure: 3SAT is solvable in time $2^{o(n)}$ iff it is solvable in time $2^{o(m)}$ for input size $m$. Note that this suggests that weighted 3SAT should also be intractable for `any parameter'.

Connections with Classical complexity:

The above paragraph suggests that ETH and parameterized complexity might be related. In fact they are and Chen and Grohe recently proved that ETH and paramterized complexity theory are isomorphic by an explicit reduction preserving isomorphism. We don't show that here, instead we prove a simpler result that yet provides a strong link to ETH (proof is a version from Downey, Castro et. al 2003 and uses parameterized miniaturization and is different from the original proof by Abhramson, Downey and Fellows):

Theorem: If FPT = W[1], then ETH fails i.e. 3SAT $\in$ DTIME$(2^{o(n)})$.

Proof: We use the equivalent definition of ETH based on simple Circuit SAT. The idea is to capture the ETH perfectly in a parameterized complexity class. The starting point is the Mini-Circuit SAT problem which is a parameterized miniaturization of simple Circuit SAT:
Input: Positive integers $k$ and $n$, and a Boolean circuit of total size at most $k\log n$.
Decision: Does there exist any $x$ for which $C(x) = 1$.

Note that the parameter here is $k$. Also, trying all possible inputs gives a brute force $O(n^k)$ algorithm. Next we give a cruicial lemma due to Cai and Juedes 2001. It essentially fully characterizes the ETH with a complexity class in parameterized theory.

Lemma 1: Mini-Circuit SAT is in FPT iff ETH fails.
Proof: One direction follows from the brute force algorithm and noting that $2^{o(k \log n)}$ is a FPT function.

Now suppose we are given a boolean circuit $C$ of size $N$ and that Mini-Cirsuit SAT is solvable in FPT time $f(k)n^c$. Set $k = f^{-1}(N)$ and $n = 2^{N/k}$. In general $k = f^{-1}(N)$ will be some slowly growing function of $N$; so $N/k = o(N)$ and also $cN/k = o(N)$. Hence using the FPT algorithm for Mini-Circuit SAT we have a running time for Circuit SAT as: $f(f^{-1}(N)) (2^{N/k})^c = N 2^{cN/k} = 2^{cN/k + \log N} = 2^{o(N)}$.
Thus ETH fails. Proved.

Now lets define the complexity class MINI[1] to be the set of languages that are FPT reducible to Mini-Circuit SAT. It turns out that many $k\log n$ miniatures of familiar NP-complete problems are MINI[1] complete (Downey, Fellows et al. 2002). It is easy to see this because essentially all the usual NP-complete reductions of Circuit SAT to these problems work as FPT reductions because they were also linear size reductions.

We concentrate on the Mini-Independent Set problem. Surprisingly, it can be reduced to the usual parameterized Independent Set problem.

Lemma 2: Independent Set parameterized by the size of independent set is MINI[1]-hard.
Proof: We give a Turing reduction. Let graph $G = (V, E)$ be the miniature, for which we wish to determine whether $G$ has an independent set of size $r$ with $|V| \leq k\log n$. We can think of verices of $G$ as organized in $k$ blocks $V_1, V_2, \ldots, V_k$ each of size $\log n$. So for each possible way of writing $r$ as a sum of $k$ terms $r = r_1 + r_2 + r_3 + \ldots + r_k$ with each $r_i \leq \log n$, we have a turing reduction branch which represnts a commitment to choose $r_i$ vertices from the corresponding block $V_i$ to be in the independent set. The total number of branches are $(\log n)^k$ and again it is a FPT-function.

For each branch, we produce a graph $G'$ that has an independent set of size $k$ iff the miniature $G$ has an independent set of size $r$ distributed as indicated by the commitment made on that branch. The graph $G'$ consists of $k$ cliques with some cross edges. The ith clique consists of vertices in correspondence with the subsets of $V_i$ of size $r_i$. An edge connects a vertex $x$ in the ith clique and a vertex $y$ in the jth clique iff there is a vertex $u$ in the subset $S_x \subseteq V_i$ represented by $x$ and a vertex $v$ in the subset $S_y \subseteq V_j$ represented by $y$, such that $(u, v) \in E$.

From Lemma 2 we can conclude that $FPT \subseteq MINI[1] \subseteq W[1]$. We just need to observe now that if FPT = W[1] it implies $M[1] \subseteq FPT$. By Lemma 1, ETH fails.
Proved.

Discussion:

The above result hints that FPT vs W[1] problem is like the P vs NP problem of classical complexity theory. In fact it was also proved by Downey and Fellows that FPT $\neq$ W[1] implies P $\neq$ NP. But it is not known that if FPT $\neq$ W[1] implies anything about the rest of W[t] hierarchy. Importantly, practical intracbility of problems in NP, which are unlikely to be complete for NP, can be shown using W[t] hardness results. Parameterized complexity has deep connections to other complexity results as well. Just recently (a week back) Galesi and Lauria showed a connection with Proof Complexity based on randomized algorithms for W[t]-hard problems. The field is very active and there are many papers and surveys being published in it.

Lower bounds for locally decodable codes

In this blog, we will look at upper bounds on the rates of locally decodable codes and their relation to a problem called Private Information Retrieval. Some of the proofs presented here are unique in that they relate the complexity of classical algorithms to quantum algorithms and prove lower bounds on quantum algorithms!

Locally decodable codes are codes that can probabilistically recover a bits from corrupted codewords by querying a small number of bits, Hadamard code being a straightforward example. While an ideal code would have have rate, be resilient to large number of errors and would be locally decodable, it as been shown that it is not possible to do extremely well on all the criteria. Specifically, locally decodable codes imply that the code rate is $o(1)$.

The first paper in this direction ([Katz, Trevisan STOC'00]) showed that any locally decodable code with constant number of query bits has codewords of superlinear length. The main ideas behind these bounds is that smooth codes (codes which are queried uniformly for local decoding) are not much worse than non-smooth ones and that such smooth codes need super-linear encoding lengths.

Def 1: Suppose that a code maps $C:\{0,1\}^n\rightarrow \Sigma^m$. For fixed $\delta, \epsilon$ and integer $q$, $C$ is $(q,\delta,\epsilon)$-locally decodable code if there exists a probabilistic algorithm that recover an arbitrary bit $x_i$ of the input $x$ with probability at least $1/2+\epsilon$ from a corrupted codeword $y$ which is with in a distance $d(y,C(x))<\delta m$ of $C(x)$ after querying no more than $q$ indices of $y$.

Intuitively, smooth code words are those for which there exist probabilistic decoding
algorithms that query the codeword (roughly) uniformly, i.e., are not heavily biased
towards querying few indices of codeword. More formally:

Def 2: A codeword $C:\{0,1\}^n\rightarrow \Sigma^m$ is $(q,c,\epsilon)$-smooth
for fixed $c,\epsilon$ and integer $q$ if there exists a local decoding algorithm $A$
that can recover an aribitrary bit of input $x$ from codeword $C(x)$ with probability at least $1/2 + \epsilon$ using no more than $q$ indices of $y$ with the added condition that $\prob[A(x,i) \textrm{reads index} j] \leq c/m$. This definition does not require $A$ to recover bits from corrupted codewords.

Lemma 1: If $C$ is a $(q,\delta,\epsilon)$-locally decodable code, then it is also a $(q,q/\delta,\epsilon)$-smooth.
Proof sketch: Take the decoder for $C$ and identify all the locations that are queried with probability greater than $q/\delta m$ while trying to decode $m$. Construct a new decoder that just assumes the value $0$ for queries to all such locations. Since there no more than $\delta m$ of these, the new decodder can still recover from errors. This new decoder has the smoothness properties we want.

Lets start with case $q=1$, and see why it is not possible to construct locally decodable codes that encode inputs of length greater than a constant. Suppose that the algorithm is trying to decode the $i$-th bit $x_i$ by reading just a single index of the (corrupt) codeword $y$. Then, $\sum_{j\in[m]} Pr_x[A(C(x),i)=x_i | A(\cdot,i) ~\textrm{reads}~ j] Pr[A(\cdot,i) ~\textrm{reads}~ j]$
$ = Pr_x[A(C(x),i) = x_i] \geq 1/2 + \epsilon$,

beacuse of which there must exist a good index $j_1$ such that $Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ j_1] \geq 1/2 + \epsilon$. Suppose that the code got randomly corrupt at index $j_1$. Then $y_{j_1}$ has no correlation with $x_i$ and the algorithm $A$ can not infer any information from index $j_1$. But since $A$ can recover from $\delta m$ errors, there must exist another index $j_2$ which also \textit{good} in the sense that it can be queried to get information about $x_i$. By extending the above reasoning, we can see that we can also corrupt $y_{j_2}$ randomly along with $y_{j_1}$ and can expect to find another index $j_3$ with reasonable correlation to $x_i$. We can extend the same argument $\delta m$ times to reason that there are at least $\delta m$ indices $j\in[m]$ such that $A$ can compute $x_i$ from $C(x)_{j}$ with probability at least $1/2 + \epsilon$. This is true of all inputs $x$, which by the pigeon hole principle implies that there is at least one index $j'\in [m]$ such that at least $\delta n$ of $x_i$s can extracted with $1/2 + \epsilon$ probability by querying $y_{j'}$. Now this means that the code should put an enormous amount of information about the input in to one index of the output. Consider the following quantitive
lemma about limit of information recovery from a function:

Lemma 2: If $C:\{0,1\}^n\rightarrow R$, and there exists a $A$ such that $Pr[A(C(x),i)=x_i] \geq 1/2 + \epsilon$,
then $\log{|R|} \geq (1-{\mathsf H}(1/2 + \epsilon))n$, where ${\mathsf H}$ is the binary entropy function.
Idea: if a decoder has an any hope of recovering bits with reasonable probability,
the compression should not e too high.

Setting $|R| = |\Sigma|$, we have $n\leq \frac{\log{|\Sigma|}}{\delta(1-{\mathsf H}(1/2 + \epsilon))}$ (Therorem 3).

Now, consider the case $q>1$. A generalization of this same argument can be used
to show superlinear bounds on the length of the codewords as follows:

Suppose that $A$ is a $(q,c,\epsilon)$-smooth decoder for code $C$.
Similar to the $q=1$ case, call a set $S$ ($S\subseteq[m], |S|\leq q$) $\epsilon$-good for $i$ if $Pr[A(C(x),i) = x_i | A\textrm{reads} s] \geq 1/2 + \epsilon$. Define hypergraph $H_i$ with vertices labelled $[m]$ and egde set $E_i$ defined by the $\epsilon/2$-good sets for $i$. A matching of a hypergraph is an edge set with no common vertex and a vertex cover is a vertex set such that every edge contains at least one vertex from the vertex set.

Lemma 4: $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Proof:
$1/2 + \epsilon$
$\leq Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E_i] Pr [A(\cdot,i) ~\textrm{reads}~ E_i]$
$+ Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E^c_i] Pr {A(\cdot,i) ~\textrm{reads}~ E^c_i]$
$< Pr[A(\cdot,i) ~\textrm{reads}~ E_i] + (1/2 + \epsilon/2)(1-Pr[A(\cdot,i) \textrm{reads} E_i])$,
which implies that $Pr[A(\cdot,i) \textrm{reads from} E_i] > \epsilon$. If $P_e$ denotes the probability that $A(\cdot,i)$ reads $e\in E_i$, then we have $\epsilon < \sum_{e\in E_i} P_e$. Also for every $j\in [m]$, $\sum_{e\in E_i | j\in e} \leq c/m$
by smoothness condition. Now, if $V$ is vertex cover of $H_i$, $e\cap V \neq \emptyset$ for all $e\in E_i$. Putting this together with the earlier fact, we have $\epsilon < \sum_{s\in E_i | e\cap V\neq \emptyset} P_e \leq \sum_{j\in V}\sum_{e\in E_i | j\in e} P_e \leq |V|c/m$ which implies $|V| > \epsilon m/c$, and therefore, $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.

Say that a set $S$ hits matching $M_i$ if there is some set $s\subseteq S$ such that $s\in M_i$. The following lemma gives a bound on the number of vertices to be selected from $H_i$ so that selected set hits a constant fraction of $M_i$s.

Lemma 5: If $H$ is a hypergraph on $m$ vertices containing hyperedges of at most than $q$ vertices. Suppose $H$ has a matching of size $\gamma m$($\gamma < 1/q$). There exists $t=\Theta(\gamma^{-1/q}m^{(q-1)/q})$ so that for a randomly chosen (from $H$) set of $t$ elements, such that the probability of this set hitting an arbitrary matching is $3/4$.

Theorem 6: If $C:\{0,1\}^n\rightarrow \Sigma^m $ is a $(q,c,\epsilon)$-locally decodable codes, then:
$m\geq (\epsilon\delta/q^2)^{1/(q-1)}{\left(\frac{3n(1-{\mathsf H}(1/2 + \epsilon))}{4\log{|\Sigma|}}\right)}^{(\frac{q}{q-1})$.
Proof: Lemma 1 shows that $C$ is $(q,q/\delta, \epsilon)$-smooth. Lemma 4 shows that for every $i$, there exists a set $M_i$ consisting of disjoint set of size at most $q$ such, each $m\in M_i$ is $\epsilon/2$-good for $i$ and $|M_i| \geq \epsilon\delta m/q^2$. Lemma 5 says that there exists a set of $t = \Theta((\epsilon\delta/q^2)^{-1/q}m^{(q-1)/q}))$ indices from $[n]$ so that the values at these locations have enough information to help decode at least $3/4$ of the inputs bits with advantage at least $\epsilon /2$. Applying Lemma 2 which gives us a lower bound on $t$ proves the theorem.

While this is some start, this still leaves a large gap between existing locally decodable codes (which are exponential long) and the lower bound. Goldreich, Karloff, Schulman and Trevisan improve the lower bound for the specific case of linear codes and decoding algorithms that query only 2 bits. They show that in such a case, $m = 2^{\Omega(n)}$. Using new techniques (reduction to quantum queries), Kerenidis and Wolf (arXiv: quant-ph/0208062v2) show that any 2-query LDC (not necessarily linear) is exponentially long. Their work is as follows.

Quantum queries
---------------

In an earlier post, Matt talked about the fact that QM systems exist in a linear superposition of several states. The choice of basis vectors used to describe can of course be chosen according to convenience (the choice is usually the eigenstates of the measurement). A superposition of states does not mean a statistical mix of different states, rather it means that the state of the particle itself is a complex vector. Such a state is called a pure state (can be expressed as $|\phi\rangle = \sum_{i} \alpha_i |i\rangle$). The density matrix of a pure state (or qubit) is the outer product $M = |\phi\rangle\langle\phi|$, where $\langle\phi| = {|\phi\rangle}^{\dagger} = {{|\phi\rangle}^T}^{*}$ is the complex conjugate transpose of $|\phi\rangle$. Note that an arbitrary matrix $M$ need not in general be the density matrix of a pure state. Matrices of the form $\rho = \sum_{i}p_i|\phi_i\rangle\langle\phi_i|$ are called mixed states. As against a pure system, such mixed states are statistical ensembles of different pure states. Obviously, such a mixed state does not represent any pure state. However, the set of pure states of which $\rho$ is an ensemble is not uniquely determined by $\rho$ alone. $\rho$ represents both $\{(p_i,|v_i\rangle\}_{i}$ and $\{(p'_i,|v'_i\rangle}\}_{i}$ as long there is a unitary matrix $U$ such that $|v_i\rangle=\sum_{j} \sqrt{p_j/p'_i}U_{ij}|v_j\rangle$. Another context where mixed systems are useful for us is to describe a subsystem of an entangled state. For example, in the entangled state $(|00\rangle + |11\rangle)/\sqrt{2}$, the second qubit alone is not a pure system. It can be viewed as the mixed state $(|0\rangle\langle 0| + |1\rangle\langle 1|)/2$.

Measurement of a qubit can be thought of as projecting the qubit in to a subspace. Based on the set of subspaces we are trying to project our qubit in to, the qubit has different probability of collapsing in to these subspaces. We can generalise this notion to arbitrary positive operators (not just orthonormal projectors) and mixed states -- such a measurement system is called positive operator valued measurement (POVM). A POVM is a set of positive operators $E_i = M_i^{\dagger} M_i$ such that $\sum_{i=1}^{k} E_i = I$. When such a measurement is made on state $\rho$, the outcome is any one of the states $M_i\rho M_i^{\dagger}/Tr(M_i\rho M_i^{\dagger})$, the probabilities of the respective outcomes being $p_i = Tr(M_i\rho M_i^{\dagger})$.

In what follows we usually deal with $k$ of the form $2^m$ and systems of $m$ qubits.
If $B=\{\ket{\psi_i}\}$ is an orthonormal basis for the system, measuring in $B$-basis means using the POVM: $E_i = \ket{\psi_i}\bra{\psi_i}$. Then, the outcome of measuring a pure state $\ket{\phi}$ is simply $|\braket{\phi}{\psi}|^2$ as we expect.

Now, we are ready to define a quantum query:
A query to $j$-th bit of a length $m$ string $y$ is the unitary operation (quantum mechanics mandates that all state transformations are unitary):
$|c\rangle|j\rangle \mapsto (-1)^{c\cdot y_j}|c\rangle |j\rangle$.
Of course, the fact that we can apply this query transformation to superposition states to (indirectly) read off several values of the function is what makes quantum queries powerful (for that matter, this ability to manipulate superposition states is what gives extra power to quantum algorithms). The following lemma illustrates this power right away:

Lemma 7: Let $f:\{0,1\}^2\mapsto \{0,1\}$ and suppose that we have $a=a_1 a_2 \in \{0,1\}^2$ whose bits are to be queried in order to compute $f(a)$. There exists a quantum algorithm that uses just one query, and outputs $f(a)$ with probability exactly $11/14$, and outputs $1-f(a)$ otherwise.
Proof: Note that a classical algorithm has to query both bits before it can compute $f(a)$ with any accuracy. The quantum algorithm is as follows: query $(|01\rangle
+ |11\rangle + |12\rangle)/\sqrt{3}$. The output of the query would be:
$|\phi_a\rangle = (|01\rangle + (-1)^{a_1}|11\rangle + (-1)^{-a_2}|12\rangle)/\sqrt{3}$, the mutual phase shifts now holding information about $a_1, a_2$. To extract this information, we measure this state (i.e., cause it to collapse to basis) with the basis $\{|\psi_b\rangle\} (b\in \{0,1\}^2)$, where $|\psi_b\rangle = (|01\rangle + (-1)^{b_1}|01\rangle + (-1)^{b_2}|10\rangle + (-1)^{b_1+b_2}|11\rangle)/2$.

$|\phi_a\rangle$ gives outcome $a$ with probability $|\langle\phi_a|\psi_a\rangle|^2 = 3/4$, the other three outcomes being equally probable ($1/12$). Suppose that the measurement outcome is $m$. The following procedure does exactly what we want:

$f$ is a constant. Output the constant with probability $11/14$.

$|f(1)^{-1}|=1$. If $f(m) = 1$, then output $1$. If $f(m)=0$, then output $0$ with probability $6/7$ and $1$ with probability $1/7$. Now, if $f(a) = 1$, then probability of the algorithm outputting $1$ is $(3/4).1 + (1/4)(1/7) = 11/14$. If $f(a) = 0$, then probability of algorithm outputting $0$ is $(11/12)(6/7) = 11/14$.

$|f(1)^{-1}|=2$. Output $f(m)$ with probability $13/14$ and $1-f(m)$ with probability $1/14$.

$|f(1)^{-1}|=3$. Similar to case $|f(1)^{-1}|=1$.

Definition: A $(q,\delta, \epsilon)$-LQDC (locally quantum-decoable code) is the same as a LDC except that we replace the probabilistic decoder with a quantum decoder and queries are quantum-queries that probe superpositions. Now we can use the earlier lemma to show that:

Theorem 8: A $(2,\delta,\epsilon)$-LDC is a $(1,\delta,4\epsilon/7)$-LQDC.
Proof: The idea is to replace the two classical queries with one quantum query (allowing for a bit of inaccuracy) based algorithm. The quantum decoder will choose two target bits the same way the classical decoder works, and instead of reading both bits and applying a function $f$, the LQDC does a quantum query on this two bits and outputs the value of $f$ on this bits (with $11/14$ accuracy). This suffices to get a $4\epsilon/7$ predictor.

Now we want to show lower bounds on 1-query LQDCs. The starting point as in the
case of 1-query LDCs is a theorem that gives a relation between the space
used to store information and fidelity of information reconstruction:

Theeorem 9: If $f$ maps $n$ bit strings to $m$-qubit states (mixed) states with
recovery probability at least $p$, then $m\geq (1 - {\mathsf H}(p))n$. This leads to:

Theeorem 10: If $C:\{0,1\}^n \rightarrow \{0,1\}^m$ is a $(1,\delta,\epsilon)$-LQDC, then $m\geq 2^{cn} - 1$, for $c=1-H(1/2 + \delta\epsilon/4)$.

Wehner and Wolf (arXiv: quant-ph/0403140v2) consider the case where the LDC encode
over larger alphabets and when the decoder uses only a part of the information from each query. Briet and Wolf (arXiv: quant-ph/0806.2101v1) further study the relation between LQDC and LDC and conclude that their powers are roughly the same for small constant number of queries.

Despite all this work, the best lower bounds on the overheads for >3 query LDC is only polynomial while known >3-query LDCs are eponentially long. This gap leaves a big open question.

LDCs and Private Information Retrieval (PIR)
--------------------------------------
The relation between LDCs and PIRs is close and worth mentioning since PIR systems yield some of the best LDCs. PIR is the problem of distribution a database among different servers so that a client can query a particular bit (by sending randomized queries to different servers) so that individual servers have almost no clue about the particular query bit that the client is accessing (information theoretic security, not computational). Here is the definition of a 2-server PIR system.

Def [GKST CCC'02]: A one-round, database size $n$, $(1-\delta)$ secure, 2-server PIR with recovery probability $p$, query size $q$, answer size $a$ is the set of algorithms $(Q,(S_1,S_2),D)$, where:
- $Q$ on input $i$ uses a randomness ($r$) to generate two $q$ bit long queries $(q_1,q_2)=Q(i,r)$. $q_1$ and $q_2$ are sent off to the two servers which use algorithms $S_1$ and $S_2$ to respond with length $a$-answers to these queries. The decoder algorithm $D$ uses $i,r$ and the answers of the servers to decode $x_i$. Given that the random string for input to $Q$ is selected from uniform distribution, the decoder $D$ should succeed with probability at least $p$. The secrecy condition is that the distribution of queries send to servers should be different for different target bits by more than a distance of $\delta$.

The same authors show that if there is a $1$-round, $(1-\delta)$-secure $2$-server PIR of database size $n$, query size $t$, answer size $a$ with recovery probability at least $1/2 + \epsilon$, then there exists a $(2,3,\epsilon-\delta)$-smooth code $C:\{0,1\}^n\rightarrow (\{0,1\}^a)^m$, where $m \leq 6.2^t$. This reduction can be used to show that a query sizes $\Theta(n^c) (c>0)$ are needed for PIR with constant answer sizes.

Kolmogorov Complexity and derandomization

Motivation
------------

The sets $R_{\mu}$ of random strings with high Kolmogorov complexity and bounded resources are good examples of sets with a lot of information content that is difficult to access. Many of these sets have been studied [BM97,Ko91,KC00] as possible examples of intractable sets that are not complete for any of the standard complexity classes. Based on [Al05], we now show that these sets can, in fact, be exploited by efficient reductions.

The completeness results are obtained via derandomization techniques, mostly relativizing hardness vs randomness tradeoffs in the contrapositive: If there exists a problem in complexity class $C$ that is hard when given oracle access to $A$, then there exists a pseudorandom generator secure against $A$ that is computable within $C$. We argue that no pseudorandom generator computable in $C$ can be secure against $R_{\mu}$, and thus every problem in $C$ is easy given oracle access to $R_{\mu}$. In other words, $C$ reduces to $R_{\mu}$.

Definitions
-----------

The computational model that we use is the multi-tape Turing machine with random-access to its input tape (the ideas work in any general model). Wlg, we pick a single universal machine $U$.
Kolmogorov complexity $C^A(x)$ is usually defined as the length of the shortest description $d\in \{0,1\}^*$ such that $U^A(d) = x$, where $U$ is a universal Turing machine with oracle access to $A$. The definition we use is slightly different: instead of requiring $U^A(d)$ to produce $x$, it should recognize the correct value of $x_i$ for every $i$.
We now define several resource-bounded Kolmogorov complexity measures:

Definition 1 [Kt,KT,KS]
$Kt^A(x) = \min\{ |d| + \log t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$
$KT^A(x) = \min\{ |d| + t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$

The only difference is that in $KT$ the time bound is given exponentially more weight. Considering space-bounded notions, too, will yield complete problems for space-bounded
classes:

$KS^A(x) = \min\{ |d| + s : U^A(d,i,b)$ accepts in $s$ space iff $x_i=b \}$

Both $Kt$ and $KS$ can be approximated in terms of $KT^A$, for appropriate oracle choices:

Theorem 2:
There exist a complete set $A$ for $E$, a complete set $B$ for $DSPACE(n)$, and a constant $c$ s.t.
$Kt(x)/c\leq KT^A(x)$
$KS(x)/c\leq KT^B(x)$

Proof sketch: Let $A \in E$ and let $x$ be given, s.t. $KT^A(x) = m$. Thus, there is a description $d_x$ of length $\leq m$, s.t. $U^A(d_x,i,b)$ accepts iff $x_i=b$ in time at most $m$. During computation, $U$ asks queries of length at most $m$. Since $A \in E$, each query can be answered in time $2^{O(m)}$.
Let $M$ denote the algorithm simulating the computation of $U^A(d_x,i,b)$ for every $i$ by directly computing the answers of $A$, then the description $d'_x= \langle M,d_x \rangle$ is sufficient for $U$ to compute $U(d'_x,i,b)$ in time $2^{O(m)}$. As $|d'_x|=m+O(1)$, we can conclude that $Kt(x) \leq m + O(1) + \log(2^{O(m)}) = O(m)$.
The proof for $KS$ is similar.

We focus on sets containing strings of high complexity, for various measures.

Definition 3:
For any Kolmogorov complexity measure $\mu$, define
$R_{\mu} = \{x : \mu(x) \geq |x|/2\}$.

The bound of $|x|/2$ is arbitrary. Essentially, all we need is that the set $R_{\mu}$ has polynomial density (it contains at least $2^n/n^k$ strings of each length $n$, for some $k$).
The following propositions are straightforward.

Proposition 4:
$R_{Kt} \in E$, $R_{KS} \in DSPACE(n)$, and $R_{KT} \in coNP$.

Proposition 5:
The sets $R_{Kt}$, $R_{KS}$ all have polynomial density.

Nonuniform Hardness Results
-------------------------------

We show that strings of high Kolmogorov complexity are very useful as oracles. We will argue that an appropriate set $R_{\mu}$ of Kolmogorov random strings can be used to distinguish the output of a pseudorandom generator $G_f$ from truly random strings. This in turn will enable us to efficiently reduce $f$ to $R_{\mu}$.

Recall that $A$ is PSPACE-robust if $PSPACE^A = P^A$ (machines are allowed to ask oracle
queries of only polynomial size). The complete sets for many large complexity classes (PSPACE, EXP, EXPSPACE) have this property, as well as the complete sets (under linear-time
reductions) for classes like $DSPACE(n)$ and $E$.

We will build a pseudorandom generator based on the Nisan-Wigderson paradigm [NW94]. [BFNW93] construct, for any $\epsilon > 0$, a variant $G^{BFNW}_f : \{0,1\}^{n^\epsilon}\rightarrow \{0,1\}^n$ such that for any $x$ of size $n^\epsilon$, the function is computable in space $O(n^\epsilon)$ given access to the Boolean function $f$ on inputs of size at most $n^\epsilon$. If $f$ is PSPACE-robust, there is a constant $c$ independent of $\epsilon$, such that each bit is computable in time $n^{\epsilon \cdot c}$ with oracle access to $f$. The following hardness versus randomness tradeoff holds.

Theorem 6 [BFNW93]
Let $f$ be a Boolean function, $\epsilon > 0$, and $G^{BFNW}_f$ be the pseudorandom generator described above. Let $T$ be a set and $p(n)$ a polynomial. If
$|P_{r\in U_n}[r \in T]-P_{x\in U_{n^\epsilon}}[G^{BFNW}_f(x)\in T]|\geq 1/p(n)$
for all large $n$, then there exists a polynomial size oracle circuit family $\{ C_n \}$ with oracle $T$ that computes $f$ and queries $T$ non-adaptively.

We use the notation $A\leq ^{P/poly}_{tt} B$ to denote that there exists a truth-table (i.e., nonadaptive) reduction from $A$ to $B$ that is computable by a family of polynomial-size circuits.
(A truth-table reduction is a reduction from one set of natural numbers to another. Truth-table reductions are more powerful than Turing reductions in a mathematical sense, since they provide finer equivalent class, but they are a weaker tool.)

Theorem 7:
Let $A$ be any PSPACE-robust set. Let $L$ be a set of polynomial density s.t. for every $x \in L$, $KT^A(x) > |x|^\gamma$ for some constant $\gamma > 0$. Then $A$ is reducible to $L$ via $\leq^{P/poly}_{tt}$ reductions.

Proof Sketch: Let $f$ be the characteristic function of $A$. Consider $G^{BFNW}_f$,
choose $\epsilon$ as follows. We know that every bit of $G^{BFNW}_f$ is computable in time $n^{\epsilon \cdot c}$ for some constant $c$ independent of $\epsilon$, given access to $A$. Set $\epsilon= \gamma/2c$ (wlg $c > 1$).

Any string in the range of $G^{BFNW}_f$ has small $KT^A$ complexity: Let $y = G^{BFNW}_f (x)$, for some $x$. On $x$, every bit of $G^{BFNW}_f$ is computable in time $n^{\gamma/2}$ with access to oracle $A$. Hence, $KT^A(y) \leq |x|+O(n^{\gamma/2} \log n)+O(1) \leq n^{\gamma}$. Thus, $L$ distinguishes the output of $G^{BFNW}_f$ from random, and by Theorem 6, $f$ is $\leq^{P/poly}_{tt}$ reducible to $L$.

By Theorem 2 and Proposition 5, we can apply Theorem 7 to
$\langle $the set $A$ from Theorem 2, $R_{Kt}\rangle $, and $\langle $the set $B$ from Theorem 2, $R_{KS}\rangle $.
($A$ and $B$ are clearly PSPACE-robust, since they are complete for EXP and PSPACE, respectively.) Combining this with Proposition 4, we get:

Corollary 8:
$R_{Kt}$ is complete for EXP under $\leq ^{P/poly}_{tt}$ reductions. $R_{KS}$ is complete for PSPACE under $\leq ^{P/poly}_{tt}$ reductions.

Those results also obtain natural examples that witness the difference in power of various reducibilities, as some of the sets which we show are complete under truth-table reductions are provably not complete under polynomial-time many-one reductions.

Thursday, April 30, 2009

Lecture 28

Today we talked about barriers to proving $P \neq NP$ and other complexity-theory results: specifically, the "relativization" barrier introduced by Baker-Gill-Solovay, and the "natural proofs" barrier introduced by Razborov-Rudich.

We also talked about results that evade them. For example, the Hopcroft-Paul-Valiant Theorem, $TIME[f(n)] \neq SPACE[f(n)]$, and the Paul-Pippenger-Szemeredi-Trotter Theorem, $DTIME[n] \neq NTIME[n]$, do not relativize. Nor do they algebrize, according to the Aaronson-Wigderson notion. (At least, my opinion is that the notion of relativizing/algebrizing makes sense for these theorems, and in this sense they do not relativ/algebrize.)

As for natural proofs, we discussed how proofs by diagonalization do not seem to naturalize, and we mentioned explicit circuit lower bound which are by diagonalization; for example, Kannan's result that $\Sigma_4$ not in $SIZE(n^{1000})$.

Finally, we mentioned some theorems that neither relativize nor (seem to) naturalize: for example, Vinodchandran's Theorem that $PP$ not in $SIZE(n^{1000})$, and Santhanam's improvement of this to an explicit promise problem in $MA$ which isn't in $SIZE(n^{1000})$.

Game theory complexity and computing Nash Equilibrium

In 1944 von Neumann and Morgenstern introduced the subject of Game Theory in their book Theory of Games and Economic Behavior. von Neumann and Morgenstern showed that there always exists a equilibrium solution to any two-person zero-sum game. (They also showed any n-person non-zero-sum game can be trivially turned into a (n+1)-person zero-sum game by adding a player who represents the net global profit or loss.) Nash extended this work to show that there exists a equilibrium for any (zero or non-zero-sum) n-person game. (A Nash Equilibrium is a set of mixed strategies for the players such that each individual player has perfect knowledge of the other players' strategies and has no benefit from changing their strategy unilaterally under this knowledge.)

However, Nash's proof is not constructive in a computational sense, in that it does not imply a general algorithm for determining the equilibrium point for an arbitrary game. There is good reason for this. We can show that there is no general algorithm for determining the equilibrium of a given game on the reals.

We must first give a definition of what we mean by a computable real. In this analysis, I will use the definition given in Computability in Analysis and Physics by Pour-El and Richards. We say a real number x is computable if there exists a computable sequence of rationals that converge effectively to x. (Converges effectively means that for some sequence of rationals ${r}_k$ there is a computable function $e$:$\mathbb{N}\to\mathbb{N}$ such that for all $N$, $k \geq e(N)$ implies $|{r}_k-x|\leq {2}^{-N}$ ) A computable sequence of reals is defined similarly (There is a computable double sequence of rationals $\{{r}_nk\}$ that converges effectively to ${x}_n$ as $n,k\rightarrow \inf$ ) Finally, a computable function is defined by saying that every computable sequence of inputs ${x}_n$ yields a computable sequence of outputs $f({x}_n)$, and that the function is effectively uniformly continuous. (Generally, given a computable function $d(n)$, for all $x,y\in \mathbb{R}$, $|x-y|\leq d(n)\implies |f(x)-f(y)|\leq {2}^{-n}$.)

Now that we have a framework to examine computability in the reals, we can move to the question of the computability of Nash equilibrium. Obviously, if the payoffs are given as uncomputable functions, determining an equilibrium is also uncomputable. However, if every payoff is computable, there does exist a computable equilibrium. The proof relies on the fact that we can generate a function that is maximized at the equilibrium of the game ($h(s)$), and we can computably determine the maximal value of a computable function. From this we can get finer and finer approximations of the maximal input to the function (since $h$ is effectively uniformly continuous), thus generating a sequence that converges to a maximal input.

But, we can also show that not every sequence of computable games is computable. This implies that there is no general procedure to compute Nash equilibrium (The proof of this relies on constructing a sequence of computable games that are not computable by using payoffs generated from two recursively inseparable sets and then showing a computable equilibrium sequence would separate them.)

We can do one better by showing that for a limited set of games, given a computable sequence of strategies for one player, the sequence of best response strategies for the other player is not computable. (Motivation for this is the idea that actually playing a game involves observing the opponent's strategy over time and computing a best response.) The proof involves showing the result for a specific, simple game.

Allowing players to make some amount of error in their responses does lead to computable strategies, however. Consider the logistic quantal response function: ${\sigma}_{ij}(\pi)=(exp(\lambda u_{ij}(\pi))/(\sum_{k\in A_{i}} exp(\lambda u_{ik}(\pi)))$, where $u_{ij}(\pi)$ is the payoff to $i$ from using pure strategy $j$ given a strategy profile (list of all player's strategies) $\pi$, and $A_{i}$ is the set of actions available to $i$. If we allow players to choose strategies with probabilities given by the logistic quantal response function for a given $\lambda$, instead of choosing a best response with certainty, then we can generate a computable "equilibrium". Note however, that no matter how tight we make this function, we have no guarantees that the resulting strategies are anywhere near a true equilibrium.

Looking at the problem of Nash equilibrium from another viewpoint entirely, we can examine the complexity of $\epsilon$-equilibrium. An $\epsilon$-Nash equilibrium is one in which each player has knowledge of other player's strategies and has at most an $epsilon$ benefit from changing their strategies under this knowledge. By taking $\epsilon=1/{2}^k$ for $k$ related to the length of the description of the given game, we can show that $r$-player $\epsilon$-Nash equilibrium is complete for the class PPAD.

PPAD is defined as (taken from Wikipedia):

G is a (possibly exponentially large) directed graph with no isolated vertices, and with every vertex having at most one predecessor and one successor. G is specified by giving a polynomial-time computable function f(v) that returns the predecessor and successor (if they exist) of the vertex v. Given a vertex s in G with no predecessor, find a vertex t≠s with no predecessor or no successor. In other words, we want any source or sink of the directed graph other than s.

PPAD is a subclass of TFNP, which is the set of FNP problems that are total. (A binary relation P(x,y) is in TFNP if and only if there is a deterministic polynomial time algorithm that can determine whether P(x,y) holds given both x and y, and for every x, there exists a y such that P(x,y) holds. (again, taken from Wikipedia))

The proof that $r$-player $\epsilon$-Nash equilibrium is PPAD-complete was show in a sequence of papers that first showed that 4-or-more player games can be reduced to 4-player graphical games (graphical games are ones in which the players can be viewed as vertices on a graph, and each player's payoffs are only related to the actions of that player's direct graphical neighbors), and then showing that 4-player games are PPAD-complete by reducing another PPAD-complete problem to it, and then separate proofs were given for 3 and 2-player games.

This work places Nash equilibrium into a concrete place in the complexity hierarchy as well as expanding the interest and scope of the class PPAD.

References:

Prasad, Kislaya. "The rationality/computability trade-off in finite games." Journal of Economic Behavior & Organization 69(2009): 17-26. Print. Link

Chen, Xi; Deng, Xiaotie "Settling the complexity of 2-player Nash equilibrium" Electronic Colloquium on Computational Complexity, Revision 1 of Report No. 140 Link

Daskalakis, Constantinos; Goldberg, Paul W.; Papadimitriou, Christos H. "The complexity of computing Nash equilibrium" Link

-Kevin Mc Inerney

Wednesday, April 29, 2009

Last lecture

Hi -- don't forget to not each lunch tomorrow; we'll bring it for the last lecture, wherein we'll also cover Natural Proofs.

Tuesday, April 28, 2009

Course evaluations

Please also do the online course evaluation: go to https://my.cmu.edu/site/main/page.academics, log in with your Andrew ID, and click on "University Course Assessment". Thanks!

(And if you didn't get a copy of the paper one for the CS department, please ask Deb for one. Thanks!)

Lecture 27

In this lecture we covered Reingold's Theorem, SL = L; i.e., undirected connectivity can be solved in log space. Here are my abridged notes on actually doing Reingold's algorithm in log-space. It's not a full proof, just a "proof by example" -- but I hope it's nevertheless clearer than a hand-wave.

For the Martin-Randall Theorem, you can consult the original Martin-Randall paper. There is a clearer proof -- but of a much more general theorem -- in a paper by Jerrum, Son, Tetali, and Vigoda. It's still not the clearest thing. If I get time I will try to write up a stripped-down, simplified version of the proof. You can also consult the Arora-Barak book for a proof of a weaker but still sufficient result: $\epsilon(G r H) \geq \Omega(\epsilon(G) \epsilon(H)^2)$. Note that it is okay to have the square on $\epsilon(H)$ here since it's an absolute constant anyway -- but it would not help if the square were on $\epsilon(G)$.

Thursday, April 23, 2009

Lecture 26

Today we covered...

Proof Complexity and the NP vs. coNP problem. The "Resolution" proof system with the resolution rule ($C \vee x$, $C \vee \overline{x}$ $\vdash$ $C \vee C'$) and the weakening rule ($C \vdash C \vee x$). Completeness of Resolution with proof sketch. Various contradictions (tautologies, in fact): Pigeonhole Principle, Tseitin Tautologies, Random 3-CNF. Treelike Resolution. Resolution width. Short Proofs Are Narrow Theorem, proved for Treelike Resolution. Ben-Sasson & Wigderson Theorem: Unsatisfiable $k$-CNFs with "expansion" require wide Resolution proofs.

By the way, the question came up as to separating Treelike Resolution from General Resolution. A very strong result was proved for this problem by Ben-Sasson, Impagliazzo, and Wigderson: There is a natural family of contradictions with $n$ variables and $O(n)$ clauses, which have Resolution refutations of length $O(n)$ but requires Treelike Resolution refutations of length $2^{\Omega(n / \log n)}$.

The contradictions here are based on "pebbling" expander graphs; more specifically, the results in an old paper of Paul, Tarjan, and Celoni.

Tuesday, April 21, 2009

Blogging

Hi all. If you have not yet written a blog post -- you're one! Your deadline is the last day of class. Please email us to confirm a topic.

Lecture 25

Today Eric gave an introduction to Property Testing. Among other things, we covered the BLR Linearity Test with the Bellare-Coppersmith-Håstad-Kiwi-Sudan proof using Fourier analysis.

Sunday, April 19, 2009

Homework 5 Correction

In Problem #4b, the everywhere you see $B_\delta(f)$ this should be $B_\delta(C)$.

Thanks to Or for pointing this out.

Updated: Another small bug-fix to Problem #4b, in the definition of $S'$. Corrected version is on the course web page.

Wednesday, April 15, 2009

Easy Witnesses

While doing the proof of IKW theorem in class, we came across the easy witness method introduced by Kabanets. The original usage of this method in Kabanets'00 paper was to show that any randomized algorithm in $\mathrm{RP}$ can be simulated by a zero-error probabilistic algorithm in expected $2^{n^\varepsilon}$-time for some $\varepsilon>0$, such that no expected sub-exponential time algorithm can refute the simulation infinitely often (io). In other words, there is no algorithm of the form $R(1^n)\in \{0,1\}^n$ in this time class such that simulation makes a mistake on input $R(1^n)$ for almost all $n$.

Before going to prove this result, we look at something simpler first, which also introduces the easy witness method:

Theorem 1: At least one of the following holds:

For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^\varepsilon})$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.
$\mathrm{BPP}=\mathrm{ZPP}$.

Proof: Consider the set $S$ of easy witness strings. A string $\alpha \in S$ is an easy witness iff it has a concise description (ie. it is the truth table of some circuit with low complexity). For an algorithm $A$ and input $x$ in $\mathrm{RP}$ we can test $A$ against all such strings and accept iff there is one that makes $A$ accept. The crucial insight is that if this algorithm makes frequent mistakes, then the set of random strings that makes $A$ accept have high circuit complexity. Thus if fix the input $x$, $A$ is a good hardness test for strings $r$, accepting only if $r$ has no small description. Notice that a significant fraction of strings have this property. Therefore using this as a source for Impagliazzo - Wigderson generator, we can derandomize BPP.

More formally, for given randomized algorithm $A(x,r)$ on input $x,|x|=n$ and random bits $r,|r|=m=n^a$ for some constant $a$, let $S^{\delta}_m$ be the set of truth tables of all $\lceil \log m\rceil$-variable Boolean functions with low circuit complexity (at most $m^\delta$). Hence $S^{\delta}_m$ contains strings of length $m$, and it can be enumerated in $\mathrm{DTIME}(2^{n^{2\delta}})$. Therefore checking for all $\alpha \in {S^{\delta}_m}$ if $R(x,\alpha)$ accepts takes time at most $(2^{n^\varepsilon})$ for $\delta = \varepsilon/(2a)$. Let this algorithm be $B^{\varepsilon}_{A}(x)$.

If this algorithm $B^{\varepsilon}_{A}$ works for every $A \in \mathrm{RP}$ and $\varepsilon>0$, then (1) holds and we are done. Otherwise, there exists $A$,$\varepsilon>0$ and an expected polynomial time refuter $R(1^n)\in\{0,1\}^n$ such that, for almost all $n$, $B^{\varepsilon}_{A}(R(1^{n}))$ does not accept whereas $A(R(1^{n}),r)$ accepts for some $r$. Therefore $A(R(1^{n}),r)$ can be viewed as a Boolean circuit $C^{\mathrm{hard}}(r)$ which accepts almost all $m$-bit strings and all accepted strings are truth tables of a $\log m$-variable Boolean function with the smallest circuit size at least $m^{\varepsilon/(2a)}$. Since $R(1^{n})$ halts in polynomial time with high probability, we have an algorithm in $\mathrm{ZPP}$ which can construct such circuits $C^{\mathrm{hard}}$.

Given such a circuit $C^{\mathrm{hard}}$, we can guess a string uniformly at random $\beta\in\{0,1\}^m$ which $C^{\mathrm{hard}}$ accepts. Reminding ourselves of IW result:

Impagliazzo-Wigderson Generator: Given $r\in\{0,1\}^{n^c}$, truth table of a circuit on $c \log n$-Boolean variables with circuit complexity at least $n^{\varepsilon c}$, there exists $c,d\in\mathbb{N}$ and a polynomial time computable generator $G_r(s):\{0,1\}^{d \log n} \rightarrow \{0,1\}^n$ with hardness $H(G_r)> n$.

If we use this $\beta$ to construct the generator $G_{\beta}$ of IW mapping $\{0,1\}^{d \log k}$ to $\{0,1\}^k$, we get a generator $G_{\beta}$ with hardness greater than $k$ for sufficiently large $k$. Constructing these takes expected polynomial time, and last step takes deterministic polynomial time, therefore for every $L \in \mathrm{BPP}$, we have $L \in \mathrm{ZPP}$. QED

Instead of IW generator, we can use Babai-Fortnow-Nisan-Wigderson generator, to arrive at the following result.

Theorem 2: At least one of the following holds:

Any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{(\log n)^{\log\log n}})$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.
$\mathrm{BPP}\subseteq \mathrm{ZPSUBEXP}$ where $\mathrm{ZPSUBEXP}$ denotes the class of languages with an expected sub exponential time algorithm.

Instead of fiddling with the generator, we can increase the power of refuter. In Theorem 1, when first condition fails, we can allow the refuter to take expected sub exponential time. After changing the circuit size and parameters appropriately, we have the following result:

Theorem 3: At least one of the following holds:

For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ such that no language in $\mathrm{ZPSUBEXP}$ can refute the simulation io.
$\mathrm{BPP}\subseteq \mathrm{ZPSUBEXP}$.

Noting that $\mathrm{DTIME}(2^{n^{\varepsilon}})$ and $\mathrm{ZPSUBEXP}$ are in $\mathrm{ZPTIME}(2^{n^{\varepsilon}})$, and $\mathrm{RP}\subseteq \mathrm{BPP}$, we can replace the above with an unconditional result:

Corollary 1: For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{ZPTIME}(2^{n^{\varepsilon}})$ such that no language in $\mathrm{ZPSUBEXP}$ can refute the simulation io.

Notice that above results imply that we can get analogous results to the ones obtained in IW paper:

In IW, assuming $\mathrm{BPP}\neq \mathrm{EXP}$, deterministic subexponential time bound was shown for $\mathrm{BPP}$. Using the machinery developed above, we can answer a weaker question: What happens when $\mathrm{ZPP}\neq \mathrm{EXP}$?

Theorem 4: If $\mathrm{ZPP}\neq \mathrm{EXP}$, then any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ for all $\varepsilon>0$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.

Proof: Assume otherwise. Then by Theorem 1, $\mathrm{BPP}=\mathrm{ZPP}$. But for some $\varepsilon>0$, $\mathrm{BPP}$ is a proper subset of $\mathrm{DTIME}(2^{n^\varepsilon})$ for $\mathrm{BPP}$ refuters (by Time Hierarchy Theorem). Using IW theorem, this implies
$\mathrm{BPP} = \mathrm{EXP} = \mathrm{ZPP}$. QED

The gap theorem of IW (either no derandomization of BPP is possible, or BPP can be simulated in deterministic $2^{n^\varepsilon}$ time for every $\varepsilon>0$) can be shown for ZPP as follows:

Theorem 5: Either $\mathrm{ZPP}=\mathrm{EXP}$ or for every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ for all $\varepsilon>0$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.

Proof: Assume $\mathrm{ZPP}=\mathrm{EXP} = \mathrm{RP}$. By Time Hierarchy Theorem, for every $\varepsilon>0$, $\mathrm{EXP}$ is not in $\mathrm{DTIME}(2^{n^\varepsilon})$ with $\mathrm{ZPP}$ refuters. Thus at most one of the conditions holds. By Theorem 4, at least one of the conditions holds. The result follows. QED

Notice that "easy witness'' method applies even when the refuters are allowed to be non-deterministic Turing machines, and by essentially using the same proof structure, we can obtain the following result:

Theorem 6: At least one of the following holds:

For every $\varepsilon>0$, any language in $\mathrm{NP}$ can be simulated in $\mathrm{DTIME}(2^{n^\varepsilon})$ such that no language in $\mathrm{NP}$ can refute the simulation io.
$\mathrm{BPP}\subseteq \mathrm{NP}$.

Tuesday, April 14, 2009

THURSDAY'S CLASS CANCELED

I was informed by Deb that all classes on Thursday are canceled. So, see you on Tuesday, April 21, when Eric will tell us about Property Testing.

(This kills a lecture, which was going to be on quantum, but Matt already told us about it in his blog post, so no great loss.)

As an interesting fact, I learned that Carnival is scheduled to precisely be the "Thanksgiving weekend" of the Spring Term. That way, the Fall and Spring schedules can match exactly.

Lecture 24

In today's class we covered many of the gory details of expander graphs. I didn't quite get a chance to finish the last details, so there are the notes.

Friday, April 10, 2009

Bounds on BQP

Quantum computation presents the attractive possibility of computing some functions exponentially faster than would be possible under a classical model. The most compelling example is perhaps Shor's quantum factoring algorithm, which runs in polylog time. But how powerful is quantum computation? Getting some of the basic answers involves (surprisingly, to me anyway) getting into some counting complexity.

To begin, we'll need to establish a basic set of definitions. The fundamental unit of quantum computation is the qubit. Like classical bits, qubits have two states; however, qubits can exist in linear superpositions of these states, $|\psi\rangle = \alpha |0\rangle + \beta |1\rangle$, where $\alpha$ and $\beta$ are complex amplitudes subject to $|\alpha|^2 + |\beta|^2 = 1$. We cannot observe these amplitudes directly, however; to determine the state of this qubit, we need to take a measurement, the result of which will always be a pure basis state. In the previous example, measuring $|\psi \rangle$ will result in $|0\rangle$ with a probability of $|\alpha|^2$ and $|1\rangle$ with a probability of $|\beta|^2$.

If you have multiple qubits, they can become entangled with each other, meaning (roughly speaking) that their measurement probabilities are no longer independent. So, for example, if we have two entangled qubits, the states $|00\rangle$ and $|11\rangle$ could each have probability $1/2$, while the other two states have probability $0$. If you somehow knew that their states were entangled in this way, then measuring only one qubit would tell you the value of the other -- although there would be no way to predict the value of either qubit before taking the measurement. If we have a quantum register of $k$ qubits, then each of its $2^k$ possible states effectively have distinct complex amplitudes. The vector of all of these $2^k$ complex amplitudes is our state vector, which completely encodes the statistical behavior of measurements of the system.

So now we have quantum registers, to store our information. In order to perform computations, we need quantum gates. These gates are simply unitary linear operators acting on the state vector. In order to talk about quantum time classes in a meaningful way, however, we need to restrict the set of allowable fundamental gates. (Since the composition of two unitary operators is also a unitary operator, you could design a single gate that would carry out any desired computation!) One common circuit basis involves only two fundamental gates. The first is the Hadamard gate, which takes

$|0\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$
and
$|1\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle)$

The second is the Toffoli gate, which takes three bits as input, passes through the first two bits unmodified, and XORs the third bit with the product of the first two.

Note that the fact that all changes in the state happen due to unitary operators implies that the number of outputs qubits is always equal to the number of input qubits. When dealing with quantum computation, we can assume without loss of generality that the quantum register has only a single accepting state.

Definition: BQP is the quantum analog of BPP. (Quantum computation is inherently probabilistic.) It is the class of languages decidable by polynomial-size quantum circuits that (to enforce uniformity) can be generated in polytime by a classical TM, accepting with probability at least $2/3$ for $x \in L$ and with probability less than $1/3$ for $x \notin L$. (Amplification theorems with similar results to those for BPP hold here, so any values $1/2 + \epsilon$ and $1/2 - \epsilon$ will work.)

How powerful is BQP? On the lower end, we know $BPP \subseteq BQP$: we can efficiently simulate classical circuits by quantum circuits. The Toffoli gate is universal for classical computation; if you hardwire the third input to $1$ it is simply a NAND gate. Coin flips can be simulated by Hadamard gates, since they transform pure states into equal superpositions.

What about upper bounds? Clearly $BQP \subseteq EXP$, since we can simulate any BQP machine by applying a polynomial number of linear operators to the exponential-size density matrix. It's also true, but less obvious, that $BQP \subseteq PSPACE$. We can show this by integrating over computational "paths". Recall that in a quantum computation, we apply a new small unitary operator to our state at each time step. For each accepting state $|y\rangle$ at time $t$, we can compute that state's amplitude by applying the operator $U_{t-1}$ to each predecessor state $x_0,\ldots,x_{2^n}$ and summing the results. By recursive application of this same step, we can compute the amplitude of any state anywhere in the computation at any time in polynomial space (although we do require exponential time).

But we can do better, namely $BQP \subseteq PP$.

Definition: GapP is the set of functions $f(x)$ such that, for some polynomial time NTM $M$, $f(x)$ is the difference between the number of accepting and rejecting paths in $M$. Note that this gives us an alternative definition for PP: a language $L$ is in PP if and only if there is some GapP function $f(x)$ such that when $x \in L$, $f(x) \gt 0$, and when $x \notin L$, $f(x) \leq 0$.

Fact: GapP is closed under negation, (exponential) sum, and (polynomial) product.

Fact: If there exist two GapP functions describing the real and complex parts of the entries of a polynomial sequence of complex-valued matrices, then there also exist two GapP functions describing their product.

More precisely (statement due to Watrous): let $p(x)$ and $q(x)$ be polynomial bounded functions, and, for each $n$, $A_{n,1}, A_{n,2}, \ldots, A_{n,p(n)}$ be a sequence of complex-valued matrices of size less than $2^{q(n)}$. Then, if we have functions

$f(1^n, 1^k, x, y) = Re(A_{n,k}[x,y])$

$g(1^n, 1^k, x, y) = Im(A_{n,k}[x,y])$

in GapP, then there also exist functions $F$ and $G$ in GapP such that

$F(1^n, x, y) = Re((\prod_{i=1}^{p(n)} A_{n, i})[x,y])$

$G(1^n, x, y) = Im((\prod_{i=1}^{p(n)} A_{n, i})[x,y])$

Observe that, applying an appropriate topological ordering to our quantum circuit, we can serialize our quantum gates into a polynomial number of steps. Additionally, observe that we can take the tensor product of a gate's matrix with the identity matrix for the unaffected bits in order to come up with a complete transition matrix for each time step. We can now use this fact to show that there are functions $f$ and $g$ in GapP describing all of our quantum gates.

In order to do this, we need to take advantage of an alternate representation of our quantum register: its density matrix, $\rho$. Glossing over some details, this effectively linearizes our state and operations. With $2^k$ entries in the state vector, the corresponding density matrix is a $2^k \times 2^k$ matrix with trace 1. The $i$'th entry on the diagonal is simply the probability of finding the register in state $i$. If we then let our state be this matrix, flattened into a vector, $vec(\rho)$, the entries of the matrices describing our quantum gates fall in the set $\{-1, -1/2, 0, 1/2, 1\}$. By the fact that these matrices can be generated in polynomial time, we know that the corresponding $f$ and $g$ for each matrix (multiplied by 2, to deal with the factors of 1/2) must be in GapP. Then, by the fact noted above, there must exist $F$ and $G$ describing the entries of the transition matrix for the entire circuit (multiplied by $2^{p(x)}$).

We can assume that the circuit has only a single accepting state, so finding the acceptance probability is then simply a matter of looking up the single matrix entry that describes the transition from the starting state $|x\rangle$ to the accepting state, which consists of a single invocation of a GapP function.

To complete the proof, note that the function $a(x) = 2^{p(x)} (Pr[Q \textrm{ accepts } x] - 1/2)$ is positive when $Q$ accepts $x$, and negative otherwise. Therefore $Q$ can be simulated by a PP machine, and $BQP \subseteq PP$.

In fact, not only does PP contain BQP, but $PP^{BQP} = PP$.

It is unknown whether this inclusion is tight, but all signs seem to indicate that it is. We can define the class QMA analogously to MA, except that Arthur is a BQP machine, and Merlin sends him quantum advice. This class, which obviously contains BQP, is contained in PP (originally noted by Kitaev and Watrous). Furthermore, a theorem of Vyalyi states that if $QMA = PP$, then $PH \subseteq PP$.

Thursday, April 9, 2009

Midterm correction

Oops. I made a mistake in the grading of question 1(e): the problem of determining whether $#L = #P$ is open. To compensate for this error, I am adding +3 to everyone's grade on the midterm. If you have any other questions about the graded midterm (or Homework 4), please see me during my office hours next Tuesday. Thanks!

Details for the IKW Theorem

The proof of the IKW Theorem I gave in class was a bit sketchy; let me discuss two points about it, raised by Or and Pranjal.

1. The first point, raised by Or, was about appealing to the Impagliazzo-Wigderson Theorem (or implicitly, the Nisan-Wigderson Theorem). Normally when one thinks about that theorem one thinks of the statement, "If there is a language $L \in E$ such that $L \not \in \bigcup_c \SIZE(n^c)$, then $BPP \subseteq \bigcap_{\epsilon} DTIME(2^{n^\epsilon})$.

However, this is not what the IKW Theorem appeals to; you should think of that as a corollary of the IW/NW Theorems. The Theorems themselves are input-length-specific; e.g., for Nisan-Wigderson, the statement we're using is of the form "Let $C$ be a circuit on $n$ bits of size $n^c$, and let $f : \{0,1\}^{n^{1/10c}} \to \{0,1\}$ be a function such that every size-$n^{2c}$ circuit has correlation at most $1/n^{2c}$. Then [some explicit PRG outputting $n$ bits based on $f$'s truth table and designs] $1/10$-fools $C$."

2. The second point, raised by Pranjal, was about the uncareful (indeed, possibly incorrect) way I explained the derandomization of $MA$ that comes out of assuming $NEXP \neq EXP$. The wrong way to say it is to start with a fixed language in $MA$ and then deduce that Arthur's $n^c$-time randomized part can be replaced by some other $NTIME(2^{n^a})$-time algorithm (with advice). That is, of course, pointless, if $a \geq c$, since Arthur could just enumerate all possible random strings. Rather, the key is to show that all of $MA$ can be done by a nondeterministic (advice-taking) Arthur who uses fixed-exponential nondeterministic time.

In other words, the correct key lemma is:

$NEXP \neq \EXP \Rightarrow \exists c, MA \subseteq io-[NTIME(2^{n^c})/n]$.

This is nontrivial because the same $c$ works for all languages in $MA$, regardless of their polytime complexity. And the proof of this uses the appeal to the length-specific IW Theorem described in Point 1 above.

Lecture 23

Today we finished the proof of the IKW Theorem. We also showed fhe Kabanets-Impagliazzo Theorem: derandomizing Arithmetic Circuit Identity Testing implies $NEXP$ not in $P/poly$ or Permanent not in $AlgP/poly$ or both.

We then moved on to the definition of expander graphs, the definition of strongly explicit graph families, and saw the Lubotsky-Phillips-Sarnak and Margulis-Gabber-Galil expander families.

Tuesday, April 7, 2009

Why there isn't a simpler proof of IKW

A question raised in class today was, Why can't we prove the IKW Lemma, $NEXP \subseteq P/poly \Rightarrow NEXP = EXP$, via techniques similar to Meyer's Lemma, $EXP \subseteq P/poly \Rightarrow EXP = PSPACE$?

(Recall, btw, that IKW actually get $NEXP = MA$ and Meyer actually gets $EXP = \Sigma_2$.)

In discussing it with Or and Pranjal after class, it seems the answer is the following. In the Meyer Lemma, for a fixed $EXP$ machine, we considered the map taking the index of tableau window into the tableau window contents. That map is in $EXP$, hence in $P/poly$. So there exists a circuit $C$ of some size $n^k$ computing the map. Given x, we enumerate all circuits of that size, and accept iff one of them implicitly describes an accepting tableau; this is doable in $PSPACE$.

Why doesn't this work for $NEXP$? For $NEXP$ computations, there may be multiple accepting tableaus. Not a problem so far. Certainly the $PSPACE$ algorithm still makes sense: enumerate all circuits of some fixed poly-size, and if one of them implicitly describes an accepting tableau, accept. The question is, if $x$ is in the language, will there be a fixed poly-size circuit describing an accepting tableau?

One has to be careful. There is no longer a clear meaning for "the map that takes a tableau window index to the tableau window contents", since there are multiple valid tableaus. Perhaps that's okay; perhaps we can allow for "multivalued functions". But still, there isn't a clear meaning for "this map is computable in $NEXP$ and hence in $P/poly$". The reason is, the usual meaning of an NTM computing a function is that some branches reject, and the ones that accept output valid answers. Such a model indeed computes a multi-valued function. However since it's not a decision problem (a language), it's not clear how we can get anything out of the assumption $NEXP \subseteq P/\poly$.

One way around this, as Or suggested, would be to make everything single-valued by looking at the lexicographically least accepting tableaus. Unfortunately, as we saw on Homework 2, this lexicographically least satisfying assignments characterize the lass $P^{NP}$, not $NP$. All is not lost, though; a paper of Bourke uses such ideas to show $EXP^{NP[t(n)]} \subseteq P/poly \Rightarrow EXP^{NP[t(n)]} = MA$ (where the notation indicates that the $EXP$ machine makes $t(n)$ queries to the oracle). This is related to Buhrman and Homer's old extension of Karp-Lipton: $EXP^{NP} \subseteq P/poly \Rightarrow EXP^{NP} = \Sigma_2$.

Lecture 22

In this lecture we covered "derandomization implies circuit lower bounds" topics. In particular, we saw:

Kannan's Theorem, $\Sigma_2$ doesn't have fixed-polynomial size circuits.

The Babai-Fortnow-Lund Theorem, $EXP \subseteq P/poly \Rightarrow EXP = MA$, and its consequences: $MA$-$EXP$ does not have polynomial size circuits, derandomizing $MA$ to $NP$ implies showing $NEXP$ does not have polynomial size circuits.

Most of the Impagliazzo-Kabanets-Wigderson Theorem, $NEXP \subseteq P/poly \Rightarrow NEXP = EXP$ (and indeed $NEXP = MA$), along with Kabanets's "easy witness method". The remainder will be proved next time.

Homework 5

Homework 5 is posted, due in 2 weeks as usual. It's long! Start early.

Thursday, April 2, 2009

Lecture 21

Today we saw the completion of Sudan-Trevisan-Vadhan's proof of the Impagliazzo-Wigderson Theorem: if there is a positive $\epsilon > 0$ such that $E$ contains a language not in $SIZE(2^{\epsilon n})$, then $P = BPP$. This included Sudan's algorithm for list-decoding Reed-Solomon codes.

Tuesday, March 31, 2009

Lecture 20

Today we covered...

Reed-Muller code review. Local decoding with "random lines". Code concatenation. Beginning Sudan-Trevisan-Vadhan's hardness amplification.

Monday, March 30, 2009

Midterm correction

On problem #4 on the midterm, assume $w \geq n$.

Thanks to Or for pointing out the necessity of this.

Thursday, March 26, 2009

Lecture 19

Today we covered...

Completion of proof of Impagliazzo's Hard-Core Set Theorem, Yao's XOR Lemma. Intro to error-correcting codes, list-decodable codes, locally decodable codes. Reed-Solomon codes, Hadamard codes, Reed-Muller codes. Connection between local list-decoding and hardness amplification.

Tuesday, March 24, 2009

Lecture 18

Today we covered: Hardness vs. randomness corollaries of Nisan-Wigderson; Impagliazzo-Wigderson Theorem ($f \in \mathsf{E}$ requiring circuits of size $2^{\epsilon n}$ implies $\mathsf{P} = \mathsf{BPP}$); Hardness Amplification , $1 - 1/\mathrm{poly}(n)$ hardness to $1/2 + 1/\mathrm{poly}(n)$ hardness, with the XOR Lemma; Impagliazzo's proof via the Hard-Core Set Theorem (with Nisan's proof via the Minimax Theorem).

Midterm

The take-home midterm is posted on the course home page. Please grab a copy now.

As a reminder, it is due next Tuesday, no collaboration is allowed, and no references to "outside" sources are allowed: just your course notes, the blog, the course homeworks, and the course homework solutions.

Friday, March 20, 2009

Lecture 17

In this lecture we covered the Nisan-Wigderson Generator, which connects hardness to randomness. In the course of proving it we also saw designs and the Yao Next-Bit Predictor Theorem, which uses the "hybrid method".

Tuesday, March 17, 2009

Lecture 16

In this lecture we covered Nisan's pseudorandom generator for log space.

One thing I had to rush over at the end was the actual definition of the generator, along with its seed length!

Here is an official inductive definition of the generator: Let $h_1, ... h_{\lg n} : \{0,1\}^k \to \{0,1\}^k$ be functions. We define $G_t : \{0,1\}^k \to \{0,1\}^{2^t k}$ inductively by $G_0(x) = x$, $G_{t}(x) = (G_{t-1}(x), G_{t-1}(h_t(x)))$.

Finally, Nisan's generator $G$ maps $\langle x, h_1, ... h_{\lg n} \rangle$ into $G_{\lg n}(x)$. Here the $h_i$'s are the "descriptions" of the hash functions.

Recall that we are using the simple "$a \circ x + b$" pairwise independent hash family; the "description" of such a function is just the $2k$-bit string $\langle a, b\rangle$. Conveniently, choosing this description uniformly at random coincides with choosing $h \sim \mathcal{H}_k$ uniformly at random.

So finally, the seed length is $\ell = k + (\lg n)(2k)$. Since $k = O(\log S)$, this is indeed $O(S \log n)$ seed-length, as claimed. The number of output bits is in fact $nk = O(n \log S)$ (which is, as we noted, slightly more than we need).

Monday, March 16, 2009

Fixed-polynomial circuit lower bounds for MA/1

In this post, we look at a recent circuit lower bound result which shows that MA/1 is not contained in the class SIZE($n^k$). Some history and motivation first: A well studied fundamental problem in lowerbouds is in showing super-polynomial circuit lowerbounds for NP. This is natural to look at, since such a result would immediately separate P and NP, because P $\in$ P/poly has poly-sized circuit families deciding languages in it.

However, though people have been looking at this problem for quite some time, even getting super-linear lower bounds on the circuit size has been evasive. While the goal has been getting super polynomial lower bounds, the current best lower bound is just $4.5n - o(n)$.

Given this state, a natural thing to do is to look at showing results for larger classes, or in other words, attack NP from above. In this line of research, Kannan showed a promising result of this nature (we have seen this in one of the homeworks): $\Sigma_2 \cap \Pi_2$ is not contained in SIZE($n^k$). Getting closer to NP, Kobler and Watanabe have shown using the halving algorithm of BCGKT (seen in class), that $ZPP^{NP}$ has a circuit lower bound of $n^k$, for any fixed $k$. This is an improvement over the previous result of Kannan, as $ZPP^{NP}$ is "closer" to NP than $\Sigma_2 \cap \Pi_2$ is (i.e. $NP \subseteq ZPP^{NP} \subseteq \Sigma_2 \cap \Pi_2$).

In a recent result, Rahul Santhanam gets even closer to NP by showing an almost polynomial circuit lower bound for MA. He shows that for any fixed $k>0$, there are languages in the class MA/1 (MA with $1$ bit of advice) which are not decided by non-uniform circuit families of size $n^k$. Why is this result important ? Intuitively, MA is like the probabilistic version of NP, and therefore such lower bounds automatically gain much value, as it is widely believed that ''randomness'' to NP does not give much more power than the basic NP class (that is, MA $=$ NP is believed to hold). This therefore nearly implies polynomial circuit lower bounds for MA, which sits just above NP in the hierarchy. We say nearly because the class for which he shows a lower bound assumes $1$ additional bit of advice (MA/1).

Now let's look at the high level picture of proving such lower bounds. Imagine we want to show a lower bound for a class C, where NP $\subseteq$ C. If it is the case that NP is not contained in P/poly, then we are done because C is only larger than NP. On the other hand, if NP $\subseteq$ P/poly, then we \emph{use} this fact to show that the polynomial hierarchy collapses to C. But there are complete problems in PH which are not decided by SIZE($n^k$) --- recall this from HW1; because of the collapse, these problems are in $C$, and this completes the proof. If we recall, we did the exact same thing for showing such lower bounds for $\Sigma_2 \cap \Pi_2$: we showed that if NP $\subseteq$ P/poly, then PH collapses to $\Sigma_2 \cap \Pi_2$. Since we know that PH has problems which are not decided by SIZE($n^k$), we have that $\Sigma_2 \cap \Pi_2 $ is not contained in SIZE($n^k$).

In fact, the argument need not always be based on the two cases of whether or not NP is in P/poly. There are cases where it helps to consider a class which \emph{contains} C, and then argue depending on it's containment in P/poly. For instance, we have the result that if PSPACE $\subseteq$ P/poly, then PSPACE $=$ MA (this problem figures in HW4). Intuitively, the proof goes as follows: if PSPACE $\subseteq$ P/poly, then the prover of the IP protocol for QBF (a complete problem for PSPACE $\subseteq$ IP) can be represented by a poly-sized circuit. Merlin can therefore simply give Arthur the circuit for the correct input size and Arthur can use the circuit to simulate any query to the prover. Completeness is direct, and intuitively, soundness holds because we know no adaptive prover can cheat Arthur, so no fixed prover can! Therefore, QBF $\in$ MA.

Using the ideas defined above, we are now ready to show the main theorem that MA/1 is not in SIZE($n^k$).
We require the following lemma, whose proof is rather technical (using Shamir's proof for IP = PSPACE) and we omit.

There exists a PSPACE-complete problem $L$ and a probabilistic polynomial time turing machine $M$ such that for any input $x$,
1. $M$ asks only queries of length $|x|$.
2. If $M$ is given the language $L$ as oracle and if $x \in L$, then $M$ accepts $x$ with probability $1$.
3. If $x \notin L$, then no matter what oracle $M$ gets, it accepts with probability only at most $1/2$.

Such a complete problem is known to exist, and we shall make use of it. Just as an aside, this definition of $L$ seems similar to the notion of checkers in HW4.

If this language $L \in $ P/poly, then PSPACE $\in $ P/poly, meaning PSPACE $=$ MA, and this would give us the desired result since PSPACE is not contained in SIZE($n^k$). Therefore, we assume that $L \notin $ P/poly, and create the following \emph{padded} language $L'$. For what follows, let $s(n)$ denote the size of the smallest circuit which can decide $L$ on inputs of size $n$. Consider the following language $L'$:

$L' = \{ x1^{y} | x \in L, |x| = n, y \geq n, y$ is a power of $2, s(n) \in ((y + n)^{k+1}, (2y+n)^{k+1}] \}$.

We show that $L' \in $ MA/1 but $L' \notin $ SIZE($n^k$). We first show the latter result: In some sense, the magic is already done as the padding has sort of diagonalized the input.
Because $L \notin $ SIZE($n^k$) (we are in the case where $L \notin$ P/poly), there exists some input size $n_0$ such that $s(n_0) > (n_0+1)^k$, where $s(n)$ is the size of the smallest circuit deciding $L$ on inputs of size $n$. Define $y_0$ to be the power of $2$ such that $(n_0 + y_0)^k < s(n_0) \leq (n_0 + 2y_0)^k$. Notice that this is the same $y$ as defined in $L'$.

Now, suppose $L'$ has a circuit family of size $m^k$. We now create a "small" circuit for deciding $L$ on input size $n_0$: the circuit simply hardwires $y_0$ additional $1$'s to the input $x$ and subsequently mimics the circuit of $L'$ on inputs of size $n_0 + y$. However, the size of this circuit is at most $(n_0+y)^k$, which contradicts the definition of $s(n)$. This shows that $L' \notin $ SIZE($n^k$).

All that remains is to show that $L' \in $ MA/1. The key here is in knowing if the input size $m$ of a given $x1^y$ is \emph{valid}, in that it satisfies the circuit size condition. It would be difficult to have the MA protocol figure it out if the circuit size is large (which it probably is), but here is where we use the $1$ bit of advice. Given an input size of $m$, the advice is $1$ if and only if $m$ can be decomposed into $n$ and $y = 2^k$ such that $y \geq n$, and the smallest circuit size for deciding $L$ on inputs of size $n$ is between $(y+n)^{k+1}$ and $(2y + n)^{k+1}$. Notice that if such a valid $(n,y)$ decomposition exists, it is unique: if we try to increase $y$ to $2y$ and decrease $n$ accordingly, it would make the resulting decomposition infeasible as we have $y \geq n$.

Thus, given $x1^y$ for which we need to decide membership in $L'$, the MA protocol does the following: Merlin sends Arthur the circuit of size $s \in ((n+y)^{k+1}, (n+2y)^{k+1}]$ which solves $L$ on inputs of length $n$. Arthur checks if the advice is $1$, and rejects if it is not. If it is, then the $m$ is of a valid length, and Arthur looks at the circuit and identifies $n$ (which is the input size of the received circuit) and makes other necessary checks, like $y \geq n$, $y$ is a power of $2$, and $(y+n)^{k+1} < s \leq (2y+n)^{k+1}$. It then simulates $M$ (recall $M$, the probabilistic turing machine which decided $L$ with some oracle given), and accepts or rejects based on $M$.

It is easy to see that if $x1^y \in L'$, then Merlin could send the correct circuit, and Arthur would always accept. On the other hand, if $x1^y \notin L'$, then Merlin ``commits'' to an oracle by sending a circuit and by the properties of $L$ and $M$, Arthur accepts with only a small probability. This completes the proof.