Sunday, May 3, 2009

Wrapup

Thanks again to all the hardy souls who hung in there for the semester!

Please remember to fill in the course evaluation at my.cmu.edu, under Academics.

Final grades are more or less tabulated; please email us if you want to learn them early.


best,
Ryan

Friday, May 1, 2009

Parameterized Complexity and ETH

We look into Parameterized Complexity theory (introduced by Downey and Fellows via many papers) and its connections to classical complexity theory specifically via the Exponential Time Hypothesis (of Impaggliazzo and Paturi 2001) in this post. Parameterized complexity provides a framework for a more refined analysis of hard problems. Heuristics, parallel algorithms, approximation schemes, randomized algorithms are some of the approaches used to counter such problems. But these approaches suffer from many defects ranging from no hard bounds on the quality of the solution (heuristics), to not being applicable to really large instances (parallel algorithms) to impractical solutions (like some PTAS's etc.).

Introduction:

Classical complexity classifies problems using the concept of some resource (time or space). This leads to a good theory but it also ignores any structural information in the input which makes problems appear harder then they really are. There is a wide variation in the worst-case complexities of known exact algorithms for the NP-complete problems. For e.g., there are pretty good 3SAT solvers present right now which scale to a large number of variables.

Parameterized complexity tries to explain these differences by assuming that there is some part of the problem (the `parameter') will be small and allow us to develop efficient poly-time algorithms. The classic example for this is to consider a database query - it has two parts the database and the query of which query is usually much smaller than the database. The query size k would be the natural paramter for a parameterized complexity analysis by admitting algorithms whose non-polynomial behavior is restricted by the parameter: if k is small and n is large, O(2k.n) is better than O(nk).

The main contribution of the theory is establishing the intractability of certain problems by classifying problems into complexity classes by reductions which are inherently 2-dimensional depending on the problem size as well the parameter. A problem can have different parameterizations also, each leading to different results.


Complexity Classes - FPT and the W[t] hierarchy:

A parameterized language L is a subset of Σ*xΣ* where Σ is a finite alphabet. Let (x,k)L, then we call x the main part and k the parameter.

a) Tractability - Fixed-parameter tractable (FPT): LFPT if it can be decided in time at most f(k).xO(1) for an arbitary function f. For a fixed k it is in P, moreover for every k it is in the same polynomial class via the same machine. As an example, consider the p-SAT problem where given the formula φ and the parameter k= number of variables in φ decide whether φ is satisfiable. This is clearly in FPT as the obvious brute force approach for formula of size n with k variables will take time O(2k.n). There are many real world problems which are in FPT like parameterized Vertex Cover which uses the kernelization technique to get a O(k.n+1.286k) algorithm (due to Chen, Kanj and Jia, 2001). Note that the classical Vertex Cover is NP complete - yet this result shows that it
is not as `hard' as say Independent Set.

b) Parametric Intractability: Analogously to classical complexity theory, Downey and Fellows developed a completeness program for intractable parameterized problems. In order to compare the hardness of parameterized problems, we need a 2-dimensional reduction (called `parameterized-reduction'). Roughly, a language L is parameterically reducible to Lʹ if there is an FPT algorithm that transforms (x,k) to (xʹ,kʹ) so that (x,k)L iff (xʹ,kʹ)Lʹ and kʹ=g(k) where g is an unrestricted function. Note that Karp reductions are rarely parameterized reductions (e.g. Clique to Independent Set is one of the exceptions).


1) W[1]: The lowest class of parameterized intractability can be defined as the set of languages that are reducible to the Short-Turing Machine Acceptance problem (also called the k-step halting problem) where we want to determine for an input consisting of a nondeterministic Turing machine M and a string x, whether M has a computation path accepting x in at most k steps. In some sense, this is the parameterized analogue to the Turing machine acceptance problem: the basic generic NP-complete problem in classical complexity theory. Canonical problems include Independent Set (does G have an independent of size k), Weighted 3SAT (does φ have a satisfying assignment of weight k) etc. We will also give an alternative definition of W[1] afterwards which is actually used for the basic results.


2) The W[t] hierarchy: Interestingly, while general k-SAT and 3SAT are equivalent for NP-hardness in classical complexity, there is no known parameterized reduction computing general satisfiability from 3-CNF formulae. This leads to a realization that the logical depth of a formula affects its parameterized complexity. Intuitively it is related to the number of alternations between unbounded fan-in AND and OR gates. Hence, we can base the hierarchy on the complexity of circuits required to check a solution.

The W[t] is based on Circuit-SAT problem (parameterized by the weight of input like for Weighted 3SAT) for family of circuits Fh,t having:
1. Small gates: with constant fan-in
2. Large gates: with unbounded fan-in

and of depth h (max. number of gates in the path) and weft (max. number of large gates in the path) at most t. Clearly, we have the containments:

FPTW[1]W[2]

It is not known whether the containments are strict or not. Note that W[1] can now be defined as the class that can be reduced to an parameterized Circuit SAT on the family of constant depth weft 1 circuits. Downey and Fellows (1995) proved that parameterized Independent Set is W[1]-complete. The hardness proof is very intricate and we omit it here. Note that using this definition it is easy to see why Independent Set is in W[1] (sketch):
1. Each vertex in G corresponds to one input gate in the circuit
2. For every edge (v,u)G build a small OR gate: (1-v)(1-u))
3. Output from small gates are given as input to a single large AND gate.

Interestingly unlike NP-hardness results, W[1]-hardness for the k-Halting Problem uses Independent Set completeness as intermediate results. There have been many papers demonstrating naturally occuring W[t]-complete problems like Dominating Set for W[2] etc.)- for an extensive list of already classified problems see the Downey and Fellows monograph.

Exponential Time Hypothesis (ETH)

ETH was first studied by Impagliazzo, Paturi and Zane 2001 and it states that 3SAT DTIME(2o(n)), where n is the number of variables. They also proved that the hypothesis is robust to analogous hypotheses for other NP-complete problems like Independent Set etc. In fact, they also showed that the ETH is independent of size measure: 3SAT is solvable in time 2o(n) iff it is solvable in time 2o(m) for input size m. Note that this suggests that weighted 3SAT should also be intractable for `any parameter'.

Connections with Classical complexity:

The above paragraph suggests that ETH and parameterized complexity might be related. In fact they are and Chen and Grohe recently proved that ETH and paramterized complexity theory are isomorphic by an explicit reduction preserving isomorphism. We don't show that here, instead we prove a simpler result that yet provides a strong link to ETH (proof is a version from Downey, Castro et. al 2003 and uses parameterized miniaturization and is different from the original proof by Abhramson, Downey and Fellows):

Theorem: If FPT = W[1], then ETH fails i.e. 3SAT DTIME(2o(n)).

Proof: We use the equivalent definition of ETH based on simple Circuit SAT. The idea is to capture the ETH perfectly in a parameterized complexity class. The starting point is the Mini-Circuit SAT problem which is a parameterized miniaturization of simple Circuit SAT:
Input: Positive integers k and n, and a Boolean circuit of total size at most klogn.
Decision: Does there exist any x for which C(x)=1.

Note that the parameter here is k. Also, trying all possible inputs gives a brute force O(nk) algorithm. Next we give a cruicial lemma due to Cai and Juedes 2001. It essentially fully characterizes the ETH with a complexity class in parameterized theory.

Lemma 1: Mini-Circuit SAT is in FPT iff ETH fails.
Proof: One direction follows from the brute force algorithm and noting that 2o(klogn) is a FPT function.

Now suppose we are given a boolean circuit C of size N and that Mini-Cirsuit SAT is solvable in FPT time f(k)nc. Set k=f-1(N) and n=2N/k. In general k=f-1(N) will be some slowly growing function of N; so N/k=o(N) and also cN/k=o(N). Hence using the FPT algorithm for Mini-Circuit SAT we have a running time for Circuit SAT as: f(f-1(N))(2N/k)c=N2cN/k=2cN/k+logN=2o(N).
Thus ETH fails. Proved.

Now lets define the complexity class MINI[1] to be the set of languages that are FPT reducible to Mini-Circuit SAT. It turns out that many klogn miniatures of familiar NP-complete problems are MINI[1] complete (Downey, Fellows et al. 2002). It is easy to see this because essentially all the usual NP-complete reductions of Circuit SAT to these problems work as FPT reductions because they were also linear size reductions.

We concentrate on the Mini-Independent Set problem. Surprisingly, it can be reduced to the usual parameterized Independent Set problem.

Lemma 2: Independent Set parameterized by the size of independent set is MINI[1]-hard.
Proof: We give a Turing reduction. Let graph G=(V,E) be the miniature, for which we wish to determine whether G has an independent set of size r with Vklogn. We can think of verices of G as organized in k blocks V1,V2,,Vk each of size logn. So for each possible way of writing r as a sum of k terms r=r1+r2+r3++rk with each rilogn, we have a turing reduction branch which represnts a commitment to choose ri vertices from the corresponding block Vi to be in the independent set. The total number of branches are (logn)k and again it is a FPT-function.

For each branch, we produce a graph Gʹ that has an independent set of size k iff the miniature G has an independent set of size r distributed as indicated by the commitment made on that branch. The graph Gʹ consists of k cliques with some cross edges. The ith clique consists of vertices in correspondence with the subsets of Vi of size ri. An edge connects a vertex x in the ith clique and a vertex y in the jth clique iff there is a vertex u in the subset SxVi represented by x and a vertex v in the subset SyVj represented by y, such that (u,v)E.

From Lemma 2 we can conclude that FPTMINI[1]W[1]. We just need to observe now that if FPT = W[1] it implies M[1]FPT. By Lemma 1, ETH fails.
Proved.

Discussion:

The above result hints that FPT vs W[1] problem is like the P vs NP problem of classical complexity theory. In fact it was also proved by Downey and Fellows that FPT W[1] implies P NP. But it is not known that if FPT W[1] implies anything about the rest of W[t] hierarchy. Importantly, practical intracbility of problems in NP, which are unlikely to be complete for NP, can be shown using W[t] hardness results. Parameterized complexity has deep connections to other complexity results as well. Just recently (a week back) Galesi and Lauria showed a connection with Proof Complexity based on randomized algorithms for W[t]-hard problems. The field is very active and there are many papers and surveys being published in it.

Lower bounds for locally decodable codes

In this blog, we will look at upper bounds on the rates of locally decodable codes and their relation to a problem called Private Information Retrieval. Some of the proofs presented here are unique in that they relate the complexity of classical algorithms to quantum algorithms and prove lower bounds on quantum algorithms!

Locally decodable codes are codes that can probabilistically recover a bits from corrupted codewords by querying a small number of bits, Hadamard code being a straightforward example. While an ideal code would have have rate, be resilient to large number of errors and would be locally decodable, it as been shown that it is not possible to do extremely well on all the criteria. Specifically, locally decodable codes imply that the code rate is o(1).

The first paper in this direction ([Katz, Trevisan STOC'00]) showed that any locally decodable code with constant number of query bits has codewords of superlinear length. The main ideas behind these bounds is that smooth codes (codes which are queried uniformly for local decoding) are not much worse than non-smooth ones and that such smooth codes need super-linear encoding lengths.

Def 1: Suppose that a code maps C:{0,1}nΣm. For fixed δ,ε and integer q, C is (q,δ,ε)-locally decodable code if there exists a probabilistic algorithm that recover an arbitrary bit xi of the input x with probability at least 1/2+ε from a corrupted codeword y which is with in a distance d(y,C(x))<δm of C(x) after querying no more than q indices of y.

Intuitively, smooth code words are those for which there exist probabilistic decoding
algorithms that query the codeword (roughly) uniformly, i.e., are not heavily biased
towards querying few indices of codeword. More formally:

Def 2: A codeword C:{0,1}nΣm is (q,c,ε)-smooth
for fixed c,ε and integer q if there exists a local decoding algorithm A
that can recover an aribitrary bit of input x from codeword C(x) with probability at least 1/2+ε using no more than q indices of y with the added condition that \prob[A(x,i)reads indexj]c/m. This definition does not require A to recover bits from corrupted codewords.

Lemma 1: If C is a (q,δ,ε)-locally decodable code, then it is also a (q,q/δ,ε)-smooth.
Proof sketch: Take the decoder for C and identify all the locations that are queried with probability greater than q/δm while trying to decode m. Construct a new decoder that just assumes the value 0 for queries to all such locations. Since there no more than δm of these, the new decodder can still recover from errors. This new decoder has the smoothness properties we want.

Lets start with case q=1, and see why it is not possible to construct locally decodable codes that encode inputs of length greater than a constant. Suppose that the algorithm is trying to decode the i-th bit xi by reading just a single index of the (corrupt) codeword y. Then, j[m]Prx[A(C(x),i)=xiA(,i)readsj]Pr[A(,i)readsj]
=Prx[A(C(x),i)=xi]1/2+ε,

beacuse of which there must exist a good index j1 such that Prx[A(C(x),i)=xiA(,i)readsj1]1/2+ε. Suppose that the code got randomly corrupt at index j1. Then yj1 has no correlation with xi and the algorithm A can not infer any information from index j1. But since A can recover from δm errors, there must exist another index j2 which also \textit{good} in the sense that it can be queried to get information about xi. By extending the above reasoning, we can see that we can also corrupt yj2 randomly along with yj1 and can expect to find another index j3 with reasonable correlation to xi. We can extend the same argument δm times to reason that there are at least δm indices j[m] such that A can compute xi from C(x)j with probability at least 1/2+ε. This is true of all inputs x, which by the pigeon hole principle implies that there is at least one index jʹ[m] such that at least δn of xis can extracted with 1/2+ε probability by querying yjʹ. Now this means that the code should put an enormous amount of information about the input in to one index of the output. Consider the following quantitive
lemma about limit of information recovery from a function:

Lemma 2: If C:{0,1}nR, and there exists a A such that Pr[A(C(x),i)=xi]1/2+ε,
then logR(1-H(1/2+ε))n, where H is the binary entropy function.
Idea: if a decoder has an any hope of recovering bits with reasonable probability,
the compression should not e too high.

Setting R=Σ, we have nlogΣδ(1-H(1/2+ε)) (Therorem 3).

Now, consider the case q>1. A generalization of this same argument can be used
to show superlinear bounds on the length of the codewords as follows:

Suppose that A is a (q,c,ε)-smooth decoder for code C.
Similar to the q=1 case, call a set S (S[m],Sq) ε-good for i if Pr[A(C(x),i)=xiAreadss]1/2+ε. Define hypergraph Hi with vertices labelled [m] and egde set Ei defined by the ε/2-good sets for i. A matching of a hypergraph is an edge set with no common vertex and a vertex cover is a vertex set such that every edge contains at least one vertex from the vertex set.

Lemma 4: Hi has a matching Mi of size at least εm/cq.
Proof:
1/2+ε
Prx[A(C(x),i)=xiA(,i)readsEi]Pr[A(,i)readsEi]
+Prx[A(C(x),i)=xiA(,i)readsEc_]PrA(,i)readsEc_]
<Pr[A(,i)readsEi]+(1/2+ε/2)(1-Pr[A(,i)readsEi]),
which implies that Pr[A(,i)reads fromEi]>ε. If Pe denotes the probability that A(,i) reads eEi, then we have ε<eEiPe. Also for every j[m], eEijec/m
by smoothness condition. Now, if V is vertex cover of Hi, eV for all eEi. Putting this together with the earlier fact, we have ε<sEieVPejVeEijePeVc/m which implies V>εm/c, and therefore, Hi has a matching Mi of size at least εm/cq.

Say that a set S hits matching Mi if there is some set sS such that sMi. The following lemma gives a bound on the number of vertices to be selected from Hi so that selected set hits a constant fraction of Mis.

Lemma 5: If H is a hypergraph on m vertices containing hyperedges of at most than q vertices. Suppose H has a matching of size γm(γ<1/q). There exists t=Θ(γ-1/qm(q-1)/q) so that for a randomly chosen (from H) set of t elements, such that the probability of this set hitting an arbitrary matching is 3/4.

Theorem 6: If C:{0,1}nΣm is a (q,c,ε)-locally decodable codes, then:
m(εδ/q2)1/(q-1)(3n(1-H(1/2+ε))4logΣ)(qq-1).
Proof: Lemma 1 shows that C is (q,q/δ,ε)-smooth. Lemma 4 shows that for every i, there exists a set Mi consisting of disjoint set of size at most q such, each mMi is ε/2-good for i and Miεδm/q2. Lemma 5 says that there exists a set of t=Θ((εδ/q2)-1/qm(q-1)/q)) indices from [n] so that the values at these locations have enough information to help decode at least 3/4 of the inputs bits with advantage at least ε/2. Applying Lemma 2 which gives us a lower bound on t proves the theorem.

While this is some start, this still leaves a large gap between existing locally decodable codes (which are exponential long) and the lower bound. Goldreich, Karloff, Schulman and Trevisan improve the lower bound for the specific case of linear codes and decoding algorithms that query only 2 bits. They show that in such a case, m=2Ω(n). Using new techniques (reduction to quantum queries), Kerenidis and Wolf (arXiv: quant-ph/0208062v2) show that any 2-query LDC (not necessarily linear) is exponentially long. Their work is as follows.


Quantum queries
---------------

In an earlier post, Matt talked about the fact that QM systems exist in a linear superposition of several states. The choice of basis vectors used to describe can of course be chosen according to convenience (the choice is usually the eigenstates of the measurement). A superposition of states does not mean a statistical mix of different states, rather it means that the state of the particle itself is a complex vector. Such a state is called a pure state (can be expressed as φ=iαii). The density matrix of a pure state (or qubit) is the outer product M=φφ, where φ=φ=φT* is the complex conjugate transpose of φ. Note that an arbitrary matrix M need not in general be the density matrix of a pure state. Matrices of the form ρ=ipiφiφi are called mixed states. As against a pure system, such mixed states are statistical ensembles of different pure states. Obviously, such a mixed state does not represent any pure state. However, the set of pure states of which ρ is an ensemble is not uniquely determined by ρ alone. ρ represents both {(pi,vi}i and {(pʹi,vʹi as long there is a unitary matrix U such that vi=jpj/pʹiUijvj. Another context where mixed systems are useful for us is to describe a subsystem of an entangled state. For example, in the entangled state (00+11)/2, the second qubit alone is not a pure system. It can be viewed as the mixed state (00+11)/2.

Measurement of a qubit can be thought of as projecting the qubit in to a subspace. Based on the set of subspaces we are trying to project our qubit in to, the qubit has different probability of collapsing in to these subspaces. We can generalise this notion to arbitrary positive operators (not just orthonormal projectors) and mixed states -- such a measurement system is called positive operator valued measurement (POVM). A POVM is a set of positive operators Ei=MiMi such that i=1kEi=I. When such a measurement is made on state ρ, the outcome is any one of the states MiρMi/Tr(MiρMi), the probabilities of the respective outcomes being pi=Tr(MiρMi).

In what follows we usually deal with k of the form 2m and systems of m qubits.
If B={\ketψi} is an orthonormal basis for the system, measuring in B-basis means using the POVM: Ei=\ketψi\braψi. Then, the outcome of measuring a pure state \ketφ is simply \braketφψ2 as we expect.

Now, we are ready to define a quantum query:
A query to j-th bit of a length m string y is the unitary operation (quantum mechanics mandates that all state transformations are unitary):
cj(-1)cyjcj.
Of course, the fact that we can apply this query transformation to superposition states to (indirectly) read off several values of the function is what makes quantum queries powerful (for that matter, this ability to manipulate superposition states is what gives extra power to quantum algorithms). The following lemma illustrates this power right away:

Lemma 7: Let f:{0,1}2{0,1} and suppose that we have a=a1a2{0,1}2 whose bits are to be queried in order to compute f(a). There exists a quantum algorithm that uses just one query, and outputs f(a) with probability exactly 11/14, and outputs 1-f(a) otherwise.
Proof: Note that a classical algorithm has to query both bits before it can compute f(a) with any accuracy. The quantum algorithm is as follows: query (01
+ |11\rangle + |12\rangle)/\sqrt{3}.Theoutputofthequerywouldbe:
φa=(01+(-1)a111+(-1)-a212)/3, the mutual phase shifts now holding information about a1,a2. To extract this information, we measure this state (i.e., cause it to collapse to basis) with the basis {ψb}(b{0,1}2), where ψb=(01+(-1)b101+(-1)b210+(-1)b1+b211)/2.

φa gives outcome a with probability φaψa2=3/4, the other three outcomes being equally probable (1/12). Suppose that the measurement outcome is m. The following procedure does exactly what we want:

  • f is a constant. Output the constant with probability 11/14.

  • f(1)-1=1. If f(m)=1, then output 1. If f(m)=0, then output 0 with probability 6/7 and 1 with probability 1/7. Now, if f(a)=1, then probability of the algorithm outputting 1 is (3/4).1+(1/4)(1/7)=11/14. If f(a)=0, then probability of algorithm outputting 0 is (11/12)(6/7)=11/14.

  • f(1)-1=2. Output f(m) with probability 13/14 and 1-f(m) with probability 1/14.

  • f(1)-1=3. Similar to case f(1)-1=1.



Definition: A (q,δ,ε)-LQDC (locally quantum-decoable code) is the same as a LDC except that we replace the probabilistic decoder with a quantum decoder and queries are quantum-queries that probe superpositions. Now we can use the earlier lemma to show that:

Theorem 8: A (2,δ,ε)-LDC is a (1,δ,4ε/7)-LQDC.
Proof: The idea is to replace the two classical queries with one quantum query (allowing for a bit of inaccuracy) based algorithm. The quantum decoder will choose two target bits the same way the classical decoder works, and instead of reading both bits and applying a function f, the LQDC does a quantum query on this two bits and outputs the value of f on this bits (with 11/14 accuracy). This suffices to get a 4ε/7 predictor.

Now we want to show lower bounds on 1-query LQDCs. The starting point as in the
case of 1-query LDCs is a theorem that gives a relation between the space
used to store information and fidelity of information reconstruction:

Theeorem 9: If f maps n bit strings to m-qubit states (mixed) states with
recovery probability at least p, then m(1-H(p))n. This leads to:

Theeorem 10: If C:{0,1}n{0,1}m is a (1,δ,ε)-LQDC, then m2cn-1, for c=1-H(1/2+δε/4).


Wehner and Wolf (arXiv: quant-ph/0403140v2) consider the case where the LDC encode
over larger alphabets and when the decoder uses only a part of the information from each query. Briet and Wolf (arXiv: quant-ph/0806.2101v1) further study the relation between LQDC and LDC and conclude that their powers are roughly the same for small constant number of queries.

Despite all this work, the best lower bounds on the overheads for >3 query LDC is only polynomial while known >3-query LDCs are eponentially long. This gap leaves a big open question.

LDCs and Private Information Retrieval (PIR)
--------------------------------------
The relation between LDCs and PIRs is close and worth mentioning since PIR systems yield some of the best LDCs. PIR is the problem of distribution a database among different servers so that a client can query a particular bit (by sending randomized queries to different servers) so that individual servers have almost no clue about the particular query bit that the client is accessing (information theoretic security, not computational). Here is the definition of a 2-server PIR system.

Def [GKST CCC'02]: A one-round, database size n, (1-δ) secure, 2-server PIR with recovery probability p, query size q, answer size a is the set of algorithms (Q,(S1,S2),D), where:
- Q on input i uses a randomness (r) to generate two q bit long queries (q1,q2)=Q(i,r). q1 and q2 are sent off to the two servers which use algorithms S1 and S2 to respond with length a-answers to these queries. The decoder algorithm D uses i,r and the answers of the servers to decode xi. Given that the random string for input to Q is selected from uniform distribution, the decoder D should succeed with probability at least p. The secrecy condition is that the distribution of queries send to servers should be different for different target bits by more than a distance of δ.

The same authors show that if there is a 1-round, (1-δ)-secure 2-server PIR of database size n, query size t, answer size a with recovery probability at least 1/2+ε, then there exists a (2,3,ε-δ)-smooth code C:{0,1}n({0,1}a)m, where m6.2t. This reduction can be used to show that a query sizes Θ(nc)(c>0) are needed for PIR with constant answer sizes.

Kolmogorov Complexity and derandomization

Motivation
------------

The sets Rμ of random strings with high Kolmogorov complexity and bounded resources are good examples of sets with a lot of information content that is difficult to access. Many of these sets have been studied [BM97,Ko91,KC00] as possible examples of intractable sets that are not complete for any of the standard complexity classes. Based on [Al05], we now show that these sets can, in fact, be exploited by efficient reductions.

The completeness results are obtained via derandomization techniques, mostly relativizing hardness vs randomness tradeoffs in the contrapositive: If there exists a problem in complexity class C that is hard when given oracle access to A, then there exists a pseudorandom generator secure against A that is computable within C. We argue that no pseudorandom generator computable in C can be secure against Rμ, and thus every problem in C is easy given oracle access to Rμ. In other words, C reduces to Rμ.

Definitions
-----------

The computational model that we use is the multi-tape Turing machine with random-access to its input tape (the ideas work in any general model). Wlg, we pick a single universal machine U.
Kolmogorov complexity CA(x) is usually defined as the length of the shortest description d{0,1}* such that UA(d)=x, where U is a universal Turing machine with oracle access to A. The definition we use is slightly different: instead of requiring UA(d) to produce x, it should recognize the correct value of xi for every i.
We now define several resource-bounded Kolmogorov complexity measures:


Definition 1 [Kt,KT,KS]
KtA(x)=min{d+logt:UA(d,i,b) accepts in t steps iff xi=b}
KTA(x)=min{d+t:UA(d,i,b) accepts in t steps iff xi=b}

The only difference is that in KT the time bound is given exponentially more weight. Considering space-bounded notions, too, will yield complete problems for space-bounded
classes:

KSA(x)=min{d+s:UA(d,i,b) accepts in s space iff xi=b}

Both Kt and KS can be approximated in terms of KTA, for appropriate oracle choices:

Theorem 2:
There exist a complete set A for E, a complete set B for DSPACE(n), and a constant c s.t.
Kt(x)/cKTA(x)
KS(x)/cKTB(x)

Proof sketch: Let AE and let x be given, s.t. KTA(x)=m. Thus, there is a description dx of length m, s.t. UA(dx,i,b) accepts iff xi=b in time at most m. During computation, U asks queries of length at most m. Since AE, each query can be answered in time 2O(m).
Let M denote the algorithm simulating the computation of UA(dx,i,b) for every i by directly computing the answers of A, then the description dʹx=M,dx is sufficient for U to compute U(dʹx,i,b) in time 2O(m). As dʹx=m+O(1), we can conclude that Kt(x)m+O(1)+log(2O(m))=O(m).
The proof for KS is similar.

We focus on sets containing strings of high complexity, for various measures.

Definition 3:
For any Kolmogorov complexity measure μ, define
Rμ={x:μ(x)x/2}.

The bound of x/2 is arbitrary. Essentially, all we need is that the set Rμ has polynomial density (it contains at least 2n/nk strings of each length n, for some k).
The following propositions are straightforward.

Proposition 4:
RKtE, RKSDSPACE(n), and RKTcoNP.

Proposition 5:
The sets RKt, RKS all have polynomial density.


Nonuniform Hardness Results
-------------------------------

We show that strings of high Kolmogorov complexity are very useful as oracles. We will argue that an appropriate set Rμ of Kolmogorov random strings can be used to distinguish the output of a pseudorandom generator Gf from truly random strings. This in turn will enable us to efficiently reduce f to Rμ.

Recall that A is PSPACE-robust if PSPACEA=PA (machines are allowed to ask oracle
queries of only polynomial size). The complete sets for many large complexity classes (PSPACE, EXP, EXPSPACE) have this property, as well as the complete sets (under linear-time
reductions) for classes like DSPACE(n) and E.

We will build a pseudorandom generator based on the Nisan-Wigderson paradigm [NW94]. [BFNW93] construct, for any ε>0, a variant GBFNW_:{0,1}nε{0,1}n such that for any x of size nε, the function is computable in space O(nε) given access to the Boolean function f on inputs of size at most nε. If f is PSPACE-robust, there is a constant c independent of ε, such that each bit is computable in time nεc with oracle access to f. The following hardness versus randomness tradeoff holds.

Theorem 6 [BFNW93]
Let f be a Boolean function, ε>0, and GBFNW_ be the pseudorandom generator described above. Let T be a set and p(n) a polynomial. If
PrUn[rT]-PxUnε[GBFNW_(x)T]1/p(n)
for all large n, then there exists a polynomial size oracle circuit family {Cn} with oracle T that computes f and queries T non-adaptively.

We use the notation AP/poly_tt to denote that there exists a truth-table (i.e., nonadaptive) reduction from A to B that is computable by a family of polynomial-size circuits.
(A truth-table reduction is a reduction from one set of natural numbers to another. Truth-table reductions are more powerful than Turing reductions in a mathematical sense, since they provide finer equivalent class, but they are a weaker tool.)

Theorem 7:
Let A be any PSPACE-robust set. Let L be a set of polynomial density s.t. for every xL, KTA(x)>xγ for some constant γ>0. Then A is reducible to L via P/poly_tt reductions.

Proof Sketch: Let f be the characteristic function of A. Consider GBFNW_,
choose ε as follows. We know that every bit of GBFNW_ is computable in time nεc for some constant c independent of ε, given access to A. Set ε=γ/2c (wlg c>1).

Any string in the range of GBFNW_ has small KTA complexity: Let y=GBFNW_(x), for some x. On x, every bit of GBFNW_ is computable in time nγ/2 with access to oracle A. Hence, KTA(y)x+O(nγ/2logn)+O(1)nγ. Thus, L distinguishes the output of GBFNW_ from random, and by Theorem 6, f is P/poly_tt reducible to L.

By Theorem 2 and Proposition 5, we can apply Theorem 7 to
the set A from Theorem 2, RKt, and the set B from Theorem 2, RKS.
(A and B are clearly PSPACE-robust, since they are complete for EXP and PSPACE, respectively.) Combining this with Proposition 4, we get:

Corollary 8:
RKt is complete for EXP under P/poly_tt reductions. RKS is complete for PSPACE under P/poly_tt reductions.


Those results also obtain natural examples that witness the difference in power of various reducibilities, as some of the sets which we show are complete under truth-table reductions are provably not complete under polynomial-time many-one reductions.



Thursday, April 30, 2009

Lecture 28

Today we talked about barriers to proving PNP and other complexity-theory results: specifically, the "relativization" barrier introduced by Baker-Gill-Solovay, and the "natural proofs" barrier introduced by Razborov-Rudich.

We also talked about results that evade them. For example, the Hopcroft-Paul-Valiant Theorem, TIME[f(n)]SPACE[f(n)], and the Paul-Pippenger-Szemeredi-Trotter Theorem, DTIME[n]NTIME[n], do not relativize. Nor do they algebrize, according to the Aaronson-Wigderson notion. (At least, my opinion is that the notion of relativizing/algebrizing makes sense for these theorems, and in this sense they do not relativ/algebrize.)

As for natural proofs, we discussed how proofs by diagonalization do not seem to naturalize, and we mentioned explicit circuit lower bound which are by diagonalization; for example, Kannan's result that Σ4 not in SIZE(n1000).

Finally, we mentioned some theorems that neither relativize nor (seem to) naturalize: for example, Vinodchandran's Theorem that PP not in SIZE(n1000), and Santhanam's improvement of this to an explicit promise problem in MA which isn't in SIZE(n1000).

Game theory complexity and computing Nash Equilibrium

In 1944 von Neumann and Morgenstern introduced the subject of Game Theory in their book Theory of Games and Economic Behavior.  von Neumann and Morgenstern showed that there always exists a equilibrium solution to any two-person zero-sum game.   (They also showed any n-person non-zero-sum game can be trivially turned into a (n+1)-person zero-sum game by adding a player who represents the net global profit or loss.)  Nash extended this work to show that there exists a equilibrium for any (zero or non-zero-sum) n-person game.  (A Nash Equilibrium is a set of mixed strategies for the players such that each individual player has perfect knowledge of the other players' strategies and has no benefit from changing their strategy unilaterally under this knowledge.)

However, Nash's proof is not constructive in a computational sense, in that it does not imply a general algorithm for determining the equilibrium point for an arbitrary game. There is good reason for this.  We can show that there is no general algorithm for determining the equilibrium of a given game on the reals.

We must first give a definition of what we mean by a computable real.  In this analysis, I will use the definition given in Computability in Analysis and Physics by Pour-El and Richards.  We say a real number x is computable if there exists a computable sequence of rationals that converge effectively to x.  (Converges effectively means that for some sequence of rationals rk there is a computable function e:  such that for all N, ke(N) implies rk-x2-N )  A computable sequence of reals is defined similarly (There is a computable double sequence of rationals {rnk}  that converges effectively to xn as n,kinf )  Finally, a computable function is defined by saying that every computable sequence of inputs xn  yields a computable sequence of outputs f(xn), and that the function is effectively uniformly continuous.  (Generally, given a computable function d(n), for all x,y, x-yd(n)\impliesf(x)-f(y)2-n.) 

Now that we have a framework to examine computability in the reals, we can move to the question of the computability of Nash equilibrium.  Obviously, if the payoffs are given as uncomputable functions, determining an equilibrium is also uncomputable.  However, if every payoff is computable, there does exist a computable equilibrium.  The proof relies on the fact that we can generate a function that is maximized at the equilibrium of the game (h(s)), and we can computably determine the maximal value of a computable function.   From this we can get finer and finer approximations of the maximal input to the function (since h is effectively uniformly continuous), thus generating a sequence that converges to a maximal input.

But, we can also show that not every sequence of computable games is computable.  This implies that there is no general procedure to compute Nash equilibrium (The proof of this relies on constructing a sequence of computable games that are not computable by using payoffs generated from two recursively inseparable sets and then showing a computable equilibrium sequence would separate them.)

We can do one better by showing that for a limited set of games, given a computable sequence of strategies for one player, the sequence of best response strategies for the other player is not computable.  (Motivation for this is the idea that actually playing a game involves observing the opponent's strategy over time and computing a best response.)  The proof involves showing the result for a specific, simple game.

Allowing players to make some amount of error in their responses does lead to computable strategies, however.  Consider the logistic quantal response function: σij(π)=(exp(λuij(π))/(kAi exp(λuik(π))),  where uij(π)  is the payoff to i from using pure strategy j given a strategy profile (list of all player's strategies) π, and Ai is the set of actions available to i.  If we allow players to choose strategies with probabilities given by the logistic quantal response function for a given λ, instead of choosing a best response with certainty, then we can generate a computable "equilibrium".  Note however, that no matter how tight we make this function, we have no guarantees that the resulting strategies are anywhere near a true equilibrium.

Looking at the problem of Nash equilibrium from another viewpoint entirely, we can examine the complexity of  ε-equilibrium.  An ε-Nash equilibrium is one in which each player has knowledge of other player's strategies and has at most an epsilon benefit from changing their strategies under this knowledge.  By taking ε=1/2k for k related to the length of the description of the given game, we can show that r-player ε-Nash equilibrium is complete for the class PPAD.

PPAD is defined as (taken from Wikipedia): 

G is a (possibly exponentially large) directed graph with no isolated vertices, and with every vertex having at most one predecessor and one successor. G is specified by giving a polynomial-time computable function f(v) that returns the predecessor and successor (if they exist) of the vertex v. Given a vertex s in G with no predecessor, find a vertex t≠s with no predecessor or no successor. In other words, we want any source or sink of the directed graph other than s.  

PPAD is a subclass of TFNP, which is the set of FNP problems that are total. (A binary relation P(x,y) is in TFNP if and only if there is a deterministic polynomial time algorithm that can determine whether P(x,y) holds given both x and y, and for every x, there exists a y such that P(x,y) holds. (again, taken from Wikipedia))

The proof that  r-player ε-Nash equilibrium is PPAD-complete was show in a sequence of papers that first showed that 4-or-more player games can be reduced to 4-player graphical games (graphical games are ones in which the players can be viewed as vertices on a graph, and each player's payoffs are only related to the actions of that player's direct graphical neighbors), and then showing that 4-player games are PPAD-complete by reducing another PPAD-complete problem to it, and then separate proofs were given for 3 and 2-player games.

This work places Nash equilibrium into a concrete place in the complexity hierarchy as well as expanding the interest and scope of the class PPAD.

References:
Prasad, Kislaya. "The rationality/computability trade-off in finite games." Journal of Economic Behavior & Organization 69(2009): 17-26. Print.  Link

Chen, Xi; Deng, Xiaotie "Settling the complexity of 2-player Nash equilibrium" Electronic Colloquium on Computational Complexity, Revision 1 of Report No. 140  Link

Daskalakis, Constantinos; Goldberg, Paul W.; Papadimitriou, Christos H. "The complexity of computing Nash equilibrium" Link

-Kevin Mc Inerney

Wednesday, April 29, 2009

Last lecture

Hi -- don't forget to not each lunch tomorrow; we'll bring it for the last lecture, wherein we'll also cover Natural Proofs.

Tuesday, April 28, 2009

Course evaluations

Please also do the online course evaluation: go to https://my.cmu.edu/site/main/page.academics, log in with your Andrew ID, and click on "University Course Assessment". Thanks!

(And if you didn't get a copy of the paper one for the CS department, please ask Deb for one. Thanks!)

Lecture 27

In this lecture we covered Reingold's Theorem, SL = L; i.e., undirected connectivity can be solved in log space. Here are my abridged notes on actually doing Reingold's algorithm in log-space. It's not a full proof, just a "proof by example" -- but I hope it's nevertheless clearer than a hand-wave.

For the Martin-Randall Theorem, you can consult the original Martin-Randall paper. There is a clearer proof -- but of a much more general theorem -- in a paper by Jerrum, Son, Tetali, and Vigoda. It's still not the clearest thing. If I get time I will try to write up a stripped-down, simplified version of the proof. You can also consult the Arora-Barak book for a proof of a weaker but still sufficient result: ε(GrH)Ω(ε(G)ε(H)2). Note that it is okay to have the square on ε(H) here since it's an absolute constant anyway -- but it would not help if the square were on ε(G).

Thursday, April 23, 2009

Lecture 26

Today we covered...

Proof Complexity and the NP vs. coNP problem. The "Resolution" proof system with the resolution rule (Cx, Cx¯ CCʹ) and the weakening rule (CCx). Completeness of Resolution with proof sketch. Various contradictions (tautologies, in fact): Pigeonhole Principle, Tseitin Tautologies, Random 3-CNF. Treelike Resolution. Resolution width. Short Proofs Are Narrow Theorem, proved for Treelike Resolution. Ben-Sasson & Wigderson Theorem: Unsatisfiable k-CNFs with "expansion" require wide Resolution proofs.


By the way, the question came up as to separating Treelike Resolution from General Resolution. A very strong result was proved for this problem by Ben-Sasson, Impagliazzo, and Wigderson: There is a natural family of contradictions with n variables and O(n) clauses, which have Resolution refutations of length O(n) but requires Treelike Resolution refutations of length 2Ω(n/logn).

The contradictions here are based on "pebbling" expander graphs; more specifically, the results in an old paper of Paul, Tarjan, and Celoni.

Tuesday, April 21, 2009

Blogging

Hi all. If you have not yet written a blog post -- you're one! Your deadline is the last day of class. Please email us to confirm a topic.

Lecture 25

Today Eric gave an introduction to Property Testing. Among other things, we covered the BLR Linearity Test with the Bellare-Coppersmith-Håstad-Kiwi-Sudan proof using Fourier analysis.

Sunday, April 19, 2009

Homework 5 Correction

In Problem #4b, the everywhere you see Bδ(f) this should be Bδ(C).

Thanks to Or for pointing this out.

Updated: Another small bug-fix to Problem #4b, in the definition of Sʹ. Corrected version is on the course web page.

Wednesday, April 15, 2009

Easy Witnesses

While doing the proof of IKW theorem in class, we came across the easy witness method introduced by Kabanets. The original usage of this method in Kabanets'00 paper was to show that any randomized algorithm in RP can be simulated by a zero-error probabilistic algorithm in expected 2nɛ-time for some ɛ>0, such that no expected sub-exponential time algorithm can refute the simulation infinitely often (io). In other words, there is no algorithm of the form R(1n){0,1}n in this time class such that simulation makes a mistake on input R(1n) for almost all n.

Before going to prove this result, we look at something simpler first, which also introduces the easy witness method:

Theorem 1: At least one of the following holds:
  1. For every ɛ>0, any language in RP can be simulated in DTIME(2nɛ) such that no language in ZPP can refute the simulation io.
  2. BPP=ZPP.
Proof: Consider the set S of easy witness strings. A string αS is an easy witness iff it has a concise description (ie. it is the truth table of some circuit with low complexity). For an algorithm A and input x in RP we can test A against all such strings and accept iff there is one that makes A accept. The crucial insight is that if this algorithm makes frequent mistakes, then the set of random strings that makes A accept have high circuit complexity. Thus if fix the input x, A is a good hardness test for strings r, accepting only if r has no small description. Notice that a significant fraction of strings have this property. Therefore using this as a source for Impagliazzo - Wigderson generator, we can derandomize BPP.

More formally, for given randomized algorithm A(x,r) on input x,x=n and random bits r,r=m=na for some constant a, let Sδ_ be the set of truth tables of all logm-variable Boolean functions with low circuit complexity (at most mδ). Hence Sδ_ contains strings of length m, and it can be enumerated in DTIME(2n2δ). Therefore checking for all αSδ_ if R(x,α) accepts takes time at most (2nɛ) for δ=ɛ/(2a). Let this algorithm be Bɛ_A.

If this algorithm Bɛ_A works for every ARP and ɛ>0, then (1) holds and we are done. Otherwise, there exists A,ɛ>0 and an expected polynomial time refuter R(1n){0,1}n such that, for almost all n, Bɛ_A does not accept whereas A(R(1n),r) accepts for some r. Therefore A(R(1n),r) can be viewed as a Boolean circuit Chard(r) which accepts almost all m-bit strings and all accepted strings are truth tables of a logm-variable Boolean function with the smallest circuit size at least mɛ/(2a). Since R(1n) halts in polynomial time with high probability, we have an algorithm in ZPP which can construct such circuits Chard.

Given such a circuit Chard, we can guess a string uniformly at random β{0,1}m which Chard accepts. Reminding ourselves of IW result:


Impagliazzo-Wigderson Generator: Given r{0,1}nc, truth table of a circuit on clogn-Boolean variables with circuit complexity at least nɛc, there exists c,d and a polynomial time computable generator Gr(s):{0,1}dlogn{0,1}n with hardness H(Gr)>n.

If we use this β to construct the generator Gβ of IW mapping {0,1}dlogk to {0,1}k, we get a generator Gβ with hardness greater than k for sufficiently large k. Constructing these takes expected polynomial time, and last step takes deterministic polynomial time, therefore for every LBPP, we have LZPP. QED

Instead of IW generator, we can use Babai-Fortnow-Nisan-Wigderson generator, to arrive at the following result.

Theorem 2: At least one of the following holds:
  1. Any language in RP can be simulated in DTIME(2(logn)loglogn) such that no language in ZPP can refute the simulation io.
  2. BPPZPSUBEXP where ZPSUBEXP denotes the class of languages with an expected sub exponential time algorithm.

Instead of fiddling with the generator, we can increase the power of refuter. In Theorem 1, when first condition fails, we can allow the refuter to take expected sub exponential time. After changing the circuit size and parameters appropriately, we have the following result:

Theorem 3: At least one of the following holds:
  1. For every ɛ>0, any language in RP can be simulated in DTIME(2nɛ) such that no language in ZPSUBEXP can refute the simulation io.
  2. BPPZPSUBEXP.
Noting that DTIME(2nɛ) and ZPSUBEXP are in ZPTIME(2nɛ), and RPBPP, we can replace the above with an unconditional result:

Corollary 1: For every ɛ>0, any language in RP can be simulated in ZPTIME(2nɛ) such that no language in ZPSUBEXP can refute the simulation io.

Notice that above results imply that we can get analogous results to the ones obtained in IW paper:

In IW, assuming BPPEXP, deterministic subexponential time bound was shown for BPP. Using the machinery developed above, we can answer a weaker question: What happens when ZPPEXP?

Theorem 4: If ZPPEXP, then any language in RP can be simulated in DTIME(2nɛ) for all ɛ>0 such that no language in ZPP can refute the simulation io.

Proof: Assume otherwise. Then by Theorem 1, BPP=ZPP. But for some ɛ>0, BPP is a proper subset of DTIME(2nɛ) for BPP refuters (by Time Hierarchy Theorem). Using IW theorem, this implies
BPP=EXP=ZPP. QED

The gap theorem of IW (either no derandomization of BPP is possible, or BPP can be simulated in deterministic 2nɛ time for every ɛ>0) can be shown for ZPP as follows:

Theorem 5: Either ZPP=EXP or for every ɛ>0, any language in RP can be simulated in DTIME(2nɛ) for all ɛ>0 such that no language in ZPP can refute the simulation io.

Proof: Assume ZPP=EXP=RP. By Time Hierarchy Theorem, for every ɛ>0, EXP is not in DTIME(2nɛ) with ZPP refuters. Thus at most one of the conditions holds. By Theorem 4, at least one of the conditions holds. The result follows. QED

Notice that "easy witness'' method applies even when the refuters are allowed to be non-deterministic Turing machines, and by essentially using the same proof structure, we can obtain the following result:

Theorem 6: At least one of the following holds:
  1. For every ɛ>0, any language in NP can be simulated in DTIME(2nɛ) such that no language in NP can refute the simulation io.
  2. BPPNP.

Tuesday, April 14, 2009

THURSDAY'S CLASS CANCELED

I was informed by Deb that all classes on Thursday are canceled. So, see you on Tuesday, April 21, when Eric will tell us about Property Testing.

(This kills a lecture, which was going to be on quantum, but Matt already told us about it in his blog post, so no great loss.)

As an interesting fact, I learned that Carnival is scheduled to precisely be the "Thanksgiving weekend" of the Spring Term. That way, the Fall and Spring schedules can match exactly.

Lecture 24

In today's class we covered many of the gory details of expander graphs. I didn't quite get a chance to finish the last details, so there are the notes.

Friday, April 10, 2009

Bounds on BQP

Quantum computation presents the attractive possibility of computing some functions exponentially faster than would be possible under a classical model. The most compelling example is perhaps Shor's quantum factoring algorithm, which runs in polylog time. But how powerful is quantum computation? Getting some of the basic answers involves (surprisingly, to me anyway) getting into some counting complexity.

To begin, we'll need to establish a basic set of definitions. The fundamental unit of quantum computation is the qubit. Like classical bits, qubits have two states; however, qubits can exist in linear superpositions of these states, ψ=α0+β1, where α and β are complex amplitudes subject to α2+β2=1. We cannot observe these amplitudes directly, however; to determine the state of this qubit, we need to take a measurement, the result of which will always be a pure basis state. In the previous example, measuring ψ will result in 0 with a probability of α2 and 1 with a probability of β2.

If you have multiple qubits, they can become entangled with each other, meaning (roughly speaking) that their measurement probabilities are no longer independent. So, for example, if we have two entangled qubits, the states 00 and 11 could each have probability 1/2, while the other two states have probability 0. If you somehow knew that their states were entangled in this way, then measuring only one qubit would tell you the value of the other -- although there would be no way to predict the value of either qubit before taking the measurement. If we have a quantum register of k qubits, then each of its 2k possible states effectively have distinct complex amplitudes. The vector of all of these 2k complex amplitudes is our state vector, which completely encodes the statistical behavior of measurements of the system.

So now we have quantum registers, to store our information. In order to perform computations, we need quantum gates. These gates are simply unitary linear operators acting on the state vector. In order to talk about quantum time classes in a meaningful way, however, we need to restrict the set of allowable fundamental gates. (Since the composition of two unitary operators is also a unitary operator, you could design a single gate that would carry out any desired computation!) One common circuit basis involves only two fundamental gates. The first is the Hadamard gate, which takes

012(0+1)
and
112(0-1)

The second is the Toffoli gate, which takes three bits as input, passes through the first two bits unmodified, and XORs the third bit with the product of the first two.

Note that the fact that all changes in the state happen due to unitary operators implies that the number of outputs qubits is always equal to the number of input qubits. When dealing with quantum computation, we can assume without loss of generality that the quantum register has only a single accepting state.

Definition
: BQP is the quantum analog of BPP. (Quantum computation is inherently probabilistic.) It is the class of languages decidable by polynomial-size quantum circuits that (to enforce uniformity) can be generated in polytime by a classical TM, accepting with probability at least 2/3 for xL and with probability less than 1/3 for xL. (Amplification theorems with similar results to those for BPP hold here, so any values 1/2+ε and 1/2-ε will work.)

How powerful is BQP? On the lower end, we know BPPBQP: we can efficiently simulate classical circuits by quantum circuits. The Toffoli gate is universal for classical computation; if you hardwire the third input to 1 it is simply a NAND gate. Coin flips can be simulated by Hadamard gates, since they transform pure states into equal superpositions.

What about upper bounds? Clearly BQPEXP, since we can simulate any BQP machine by applying a polynomial number of linear operators to the exponential-size density matrix. It's also true, but less obvious, that BQPPSPACE. We can show this by integrating over computational "paths". Recall that in a quantum computation, we apply a new small unitary operator to our state at each time step. For each accepting state y at time t, we can compute that state's amplitude by applying the operator Ut-1 to each predecessor state x0,,x2n and summing the results. By recursive application of this same step, we can compute the amplitude of any state anywhere in the computation at any time in polynomial space (although we do require exponential time).

But we can do better, namely BQPPP.

Definition: GapP is the set of functions f(x) such that, for some polynomial time NTM M, f(x) is the difference between the number of accepting and rejecting paths in M. Note that this gives us an alternative definition for PP: a language L is in PP if and only if there is some GapP function f(x) such that when xL, f(x)>0, and when xL, f(x)0.

Fact: GapP is closed under negation, (exponential) sum, and (polynomial) product.

Fact
: If there exist two GapP functions describing the real and complex parts of the entries of a polynomial sequence of complex-valued matrices, then there also exist two GapP functions describing their product.

More precisely (statement due to Watrous): let p(x) and q(x) be polynomial bounded functions, and, for each n, An,1,An,2,,An,p(n) be a sequence of complex-valued matrices of size less than 2q(n). Then, if we have functions

f(1n,1k,x,y)=Re(An,k[x,y])

g(1n,1k,x,y)=Im(An,k[x,y])

in GapP, then there also exist functions F and G in GapP such that

F(1n,x,y)=Re((i=1p(n)An,i)[x,y])

G(1n,x,y)=Im((i=1p(n)An,i)[x,y])

Observe that, applying an appropriate topological ordering to our quantum circuit, we can serialize our quantum gates into a polynomial number of steps. Additionally, observe that we can take the tensor product of a gate's matrix with the identity matrix for the unaffected bits in order to come up with a complete transition matrix for each time step. We can now use this fact to show that there are functions f and g in GapP describing all of our quantum gates.

In order to do this, we need to take advantage of an alternate representation of our quantum register: its density matrix, ρ. Glossing over some details, this effectively linearizes our state and operations. With 2k entries in the state vector, the corresponding density matrix is a 2k×2k matrix with trace 1. The i'th entry on the diagonal is simply the probability of finding the register in state i. If we then let our state be this matrix, flattened into a vector, vec(ρ), the entries of the matrices describing our quantum gates fall in the set {-1,-1/2,0,1/2,1}. By the fact that these matrices can be generated in polynomial time, we know that the corresponding f and g for each matrix (multiplied by 2, to deal with the factors of 1/2) must be in GapP. Then, by the fact noted above, there must exist F and G describing the entries of the transition matrix for the entire circuit (multiplied by 2p(x)).

We can assume that the circuit has only a single accepting state, so finding the acceptance probability is then simply a matter of looking up the single matrix entry that describes the transition from the starting state x to the accepting state, which consists of a single invocation of a GapP function.

To complete the proof, note that the function a(x)=2p(x)(Pr[Q accepts x]-1/2) is positive when Q accepts x, and negative otherwise. Therefore Q can be simulated by a PP machine, and BQPPP.

In fact, not only does PP contain BQP, but PPBQP=PP.

It is unknown whether this inclusion is tight, but all signs seem to indicate that it is. We can define the class QMA analogously to MA, except that Arthur is a BQP machine, and Merlin sends him quantum advice. This class, which obviously contains BQP, is contained in PP (originally noted by Kitaev and Watrous). Furthermore, a theorem of Vyalyi states that if QMA=PP, then PHPP.

Thursday, April 9, 2009

Midterm correction

Oops.  I made a mistake in the grading of question 1(e): the problem of determining whether #L=#P is open.  To compensate for this error, I am adding +3 to everyone's grade on the midterm.  If you have any other questions about the graded midterm (or Homework 4), please see me during my office hours next Tuesday.  Thanks!

Details for the IKW Theorem

The proof of the IKW Theorem I gave in class was a bit sketchy; let me discuss two points about it, raised by Or and Pranjal.

1. The first point, raised by Or, was about appealing to the Impagliazzo-Wigderson Theorem (or implicitly, the Nisan-Wigderson Theorem). Normally when one thinks about that theorem one thinks of the statement, "If there is a language LE such that L\notc\SIZE(nc), then BPPεDTIME(2nε).

However, this is not what the IKW Theorem appeals to; you should think of that as a corollary of the IW/NW Theorems. The Theorems themselves are input-length-specific; e.g., for Nisan-Wigderson, the statement we're using is of the form "Let C be a circuit on n bits of size nc, and let f:{0,1}n1/10c{0,1} be a function such that every size-n2c circuit has correlation at most 1/n2c. Then [some explicit PRG outputting n bits based on f's truth table and designs] 1/10-fools C."

2. The second point, raised by Pranjal, was about the uncareful (indeed, possibly incorrect) way I explained the derandomization of MA that comes out of assuming NEXPEXP. The wrong way to say it is to start with a fixed language in MA and then deduce that Arthur's nc-time randomized part can be replaced by some other NTIME(2na)-time algorithm (with advice). That is, of course, pointless, if ac, since Arthur could just enumerate all possible random strings. Rather, the key is to show that all of MA can be done by a nondeterministic (advice-taking) Arthur who uses fixed-exponential nondeterministic time.

In other words, the correct key lemma is:

NEXP\EXPc,MAio-[NTIME(2nc)/n].

This is nontrivial because the same c works for all languages in MA, regardless of their polytime complexity. And the proof of this uses the appeal to the length-specific IW Theorem described in Point 1 above.

Lecture 23

Today we finished the proof of the IKW Theorem. We also showed fhe Kabanets-Impagliazzo Theorem: derandomizing Arithmetic Circuit Identity Testing implies NEXP not in P/poly or Permanent not in AlgP/poly or both.

We then moved on to the definition of expander graphs, the definition of strongly explicit graph families, and saw the Lubotsky-Phillips-Sarnak and Margulis-Gabber-Galil expander families.

Tuesday, April 7, 2009

Why there isn't a simpler proof of IKW

A question raised in class today was, Why can't we prove the IKW Lemma, NEXPP/polyNEXP=EXP, via techniques similar to Meyer's Lemma, EXPP/polyEXP=PSPACE?

(Recall, btw, that IKW actually get NEXP=MA and Meyer actually gets EXP=Σ2.)

In discussing it with Or and Pranjal after class, it seems the answer is the following. In the Meyer Lemma, for a fixed EXP machine, we considered the map taking the index of tableau window into the tableau window contents. That map is in EXP, hence in P/poly. So there exists a circuit C of some size nk computing the map. Given x, we enumerate all circuits of that size, and accept iff one of them implicitly describes an accepting tableau; this is doable in PSPACE.

Why doesn't this work for NEXP? For NEXP computations, there may be multiple accepting tableaus. Not a problem so far. Certainly the PSPACE algorithm still makes sense: enumerate all circuits of some fixed poly-size, and if one of them implicitly describes an accepting tableau, accept. The question is, if x is in the language, will there be a fixed poly-size circuit describing an accepting tableau?

One has to be careful. There is no longer a clear meaning for "the map that takes a tableau window index to the tableau window contents", since there are multiple valid tableaus. Perhaps that's okay; perhaps we can allow for "multivalued functions". But still, there isn't a clear meaning for "this map is computable in NEXP and hence in P/poly". The reason is, the usual meaning of an NTM computing a function is that some branches reject, and the ones that accept output valid answers. Such a model indeed computes a multi-valued function. However since it's not a decision problem (a language), it's not clear how we can get anything out of the assumption NEXPP/\poly.

One way around this, as Or suggested, would be to make everything single-valued by looking at the lexicographically least accepting tableaus. Unfortunately, as we saw on Homework 2, this lexicographically least satisfying assignments characterize the lass PNP, not NP. All is not lost, though; a paper of Bourke uses such ideas to show EXPNP[t(n)]P/polyEXPNP[t(n)]=MA (where the notation indicates that the EXP machine makes t(n) queries to the oracle). This is related to Buhrman and Homer's old extension of Karp-Lipton: EXPNPP/polyEXPNP=Σ2.

Lecture 22

In this lecture we covered "derandomization implies circuit lower bounds" topics. In particular, we saw:

Kannan's Theorem, Σ2 doesn't have fixed-polynomial size circuits.

The Babai-Fortnow-Lund Theorem, EXPP/polyEXP=MA, and its consequences: MA-EXP does not have polynomial size circuits, derandomizing MA to NP implies showing NEXP does not have polynomial size circuits.

Most of the Impagliazzo-Kabanets-Wigderson Theorem, NEXPP/polyNEXP=EXP (and indeed NEXP=MA), along with Kabanets's "easy witness method". The remainder will be proved next time.

Homework 5

Homework 5 is posted, due in 2 weeks as usual. It's long! Start early.

Thursday, April 2, 2009

Lecture 21

Today we saw the completion of Sudan-Trevisan-Vadhan's proof of the Impagliazzo-Wigderson Theorem: if there is a positive ε>0 such that E contains a language not in SIZE(2εn), then P=BPP. This included Sudan's algorithm for list-decoding Reed-Solomon codes.

Tuesday, March 31, 2009

Lecture 20

Today we covered...

Reed-Muller code review. Local decoding with "random lines". Code concatenation. Beginning Sudan-Trevisan-Vadhan's hardness amplification.

Monday, March 30, 2009

Midterm correction

On problem #4 on the midterm, assume wn.

Thanks to Or for pointing out the necessity of this.

Thursday, March 26, 2009

Lecture 19

Today we covered...

Completion of proof of Impagliazzo's Hard-Core Set Theorem, Yao's XOR Lemma. Intro to error-correcting codes, list-decodable codes, locally decodable codes. Reed-Solomon codes, Hadamard codes, Reed-Muller codes. Connection between local list-decoding and hardness amplification.

Tuesday, March 24, 2009

Lecture 18

Today we covered: Hardness vs. randomness corollaries of Nisan-Wigderson; Impagliazzo-Wigderson Theorem (fE requiring circuits of size 2εn implies P=BPP); Hardness Amplification , 1-1/poly(n) hardness to 1/2+1/poly(n) hardness, with the XOR Lemma; Impagliazzo's proof via the Hard-Core Set Theorem (with Nisan's proof via the Minimax Theorem).

Midterm

The take-home midterm is posted on the course home page. Please grab a copy now.

As a reminder, it is due next Tuesday, no collaboration is allowed, and no references to "outside" sources are allowed: just your course notes, the blog, the course homeworks, and the course homework solutions.

Friday, March 20, 2009

Lecture 17

In this lecture we covered the Nisan-Wigderson Generator, which connects hardness to randomness. In the course of proving it we also saw designs and the Yao Next-Bit Predictor Theorem, which uses the "hybrid method".

Tuesday, March 17, 2009

Lecture 16

In this lecture we covered Nisan's pseudorandom generator for log space.

One thing I had to rush over at the end was the actual definition of the generator, along with its seed length!

Here is an official inductive definition of the generator: Let h1,...hlgn:{0,1}k{0,1}k be functions. We define Gt:{0,1}k{0,1}2tk inductively by G0(x)=x, Gt(x)=(Gt-1(x),Gt-1(ht(x))).

Finally, Nisan's generator G maps x,h1,...hlgn into Glgn(x). Here the hi's are the "descriptions" of the hash functions.

Recall that we are using the simple "ax+b" pairwise independent hash family; the "description" of such a function is just the 2k-bit string a,b. Conveniently, choosing this description uniformly at random coincides with choosing hk uniformly at random.

So finally, the seed length is =k+(lgn)(2k). Since k=O(logS), this is indeed O(Slogn) seed-length, as claimed. The number of output bits is in fact nk=O(nlogS) (which is, as we noted, slightly more than we need).

Monday, March 16, 2009

Fixed-polynomial circuit lower bounds for MA/1

In this post, we look at a recent circuit lower bound result which shows that MA/1 is not contained in the class SIZE(nk). Some history and motivation first: A well studied fundamental problem in lowerbouds is in showing super-polynomial circuit lowerbounds for NP. This is natural to look at, since such a result would immediately separate P and NP, because P P/poly has poly-sized circuit families deciding languages in it.

However, though people have been looking at this problem for quite some time, even getting super-linear lower bounds on the circuit size has been evasive. While the goal has been getting super polynomial lower bounds, the current best lower bound is just 4.5n-o(n).

Given this state, a natural thing to do is to look at showing results for larger classes, or in other words, attack NP from above. In this line of research, Kannan showed a promising result of this nature (we have seen this in one of the homeworks): Σ2Π2 is not contained in SIZE(nk). Getting closer to NP, Kobler and Watanabe have shown using the halving algorithm of BCGKT (seen in class), that ZPPNP has a circuit lower bound of nk, for any fixed k. This is an improvement over the previous result of Kannan, as ZPPNP is "closer" to NP than Σ2Π2 is (i.e. NPZPPNPΣ2Π2).

In a recent result, Rahul Santhanam gets even closer to NP by showing an almost polynomial circuit lower bound for MA. He shows that for any fixed k>0, there are languages in the class MA/1 (MA with 1 bit of advice) which are not decided by non-uniform circuit families of size nk. Why is this result important ? Intuitively, MA is like the probabilistic version of NP, and therefore such lower bounds automatically gain much value, as it is widely believed that ''randomness'' to NP does not give much more power than the basic NP class (that is, MA = NP is believed to hold). This therefore nearly implies polynomial circuit lower bounds for MA, which sits just above NP in the hierarchy. We say nearly because the class for which he shows a lower bound assumes 1 additional bit of advice (MA/1).

Now let's look at the high level picture of proving such lower bounds. Imagine we want to show a lower bound for a class C, where NP C. If it is the case that NP is not contained in P/poly, then we are done because C is only larger than NP. On the other hand, if NP P/poly, then we \emph{use} this fact to show that the polynomial hierarchy collapses to C. But there are complete problems in PH which are not decided by SIZE(nk) --- recall this from HW1; because of the collapse, these problems are in C, and this completes the proof. If we recall, we did the exact same thing for showing such lower bounds for Σ2Π2: we showed that if NP P/poly, then PH collapses to Σ2Π2. Since we know that PH has problems which are not decided by SIZE(nk), we have that Σ2Π2 is not contained in SIZE(nk).

In fact, the argument need not always be based on the two cases of whether or not NP is in P/poly. There are cases where it helps to consider a class which \emph{contains} C, and then argue depending on it's containment in P/poly. For instance, we have the result that if PSPACE P/poly, then PSPACE = MA (this problem figures in HW4). Intuitively, the proof goes as follows: if PSPACE P/poly, then the prover of the IP protocol for QBF (a complete problem for PSPACE IP) can be represented by a poly-sized circuit. Merlin can therefore simply give Arthur the circuit for the correct input size and Arthur can use the circuit to simulate any query to the prover. Completeness is direct, and intuitively, soundness holds because we know no adaptive prover can cheat Arthur, so no fixed prover can! Therefore, QBF MA.

Using the ideas defined above, we are now ready to show the main theorem that MA/1 is not in SIZE(nk).
We require the following lemma, whose proof is rather technical (using Shamir's proof for IP = PSPACE) and we omit.

There exists a PSPACE-complete problem L and a probabilistic polynomial time turing machine M such that for any input x,
1. M asks only queries of length x.
2. If M is given the language L as oracle and if xL, then M accepts x with probability 1.
3. If xL, then no matter what oracle M gets, it accepts with probability only at most 1/2.

Such a complete problem is known to exist, and we shall make use of it. Just as an aside, this definition of L seems similar to the notion of checkers in HW4.

If this language L P/poly, then PSPACE P/poly, meaning PSPACE = MA, and this would give us the desired result since PSPACE is not contained in SIZE(nk). Therefore, we assume that L P/poly, and create the following \emph{padded} language Lʹ. For what follows, let s(n) denote the size of the smallest circuit which can decide L on inputs of size n. Consider the following language Lʹ:

Lʹ={x1yxL,x=n,yn,y is a power of 2,s(n)((y+n)k+1,(2y+n)k+1]}.

We show that Lʹ MA/1 but Lʹ SIZE(nk). We first show the latter result: In some sense, the magic is already done as the padding has sort of diagonalized the input.
Because L SIZE(nk) (we are in the case where L P/poly), there exists some input size n0 such that s(n0)>(n0+1)k, where s(n) is the size of the smallest circuit deciding L on inputs of size n. Define y0 to be the power of 2 such that (n0+y0)k<s(n0)(n0+2y0)k. Notice that this is the same y as defined in Lʹ.

Now, suppose Lʹ has a circuit family of size mk. We now create a "small" circuit for deciding L on input size n0: the circuit simply hardwires y0 additional 1's to the input x and subsequently mimics the circuit of Lʹ on inputs of size n0+y. However, the size of this circuit is at most (n0+y)k, which contradicts the definition of s(n). This shows that Lʹ SIZE(nk).

All that remains is to show that Lʹ MA/1. The key here is in knowing if the input size m of a given x1y is \emph{valid}, in that it satisfies the circuit size condition. It would be difficult to have the MA protocol figure it out if the circuit size is large (which it probably is), but here is where we use the 1 bit of advice. Given an input size of m, the advice is 1 if and only if m can be decomposed into n and y=2k such that yn, and the smallest circuit size for deciding L on inputs of size n is between (y+n)k+1 and (2y+n)k+1. Notice that if such a valid (n,y) decomposition exists, it is unique: if we try to increase y to 2y and decrease n accordingly, it would make the resulting decomposition infeasible as we have yn.

Thus, given x1y for which we need to decide membership in Lʹ, the MA protocol does the following: Merlin sends Arthur the circuit of size s((n+y)k+1,(n+2y)k+1] which solves L on inputs of length n. Arthur checks if the advice is 1, and rejects if it is not. If it is, then the m is of a valid length, and Arthur looks at the circuit and identifies n (which is the input size of the received circuit) and makes other necessary checks, like yn, y is a power of 2, and (y+n)k+1<s(2y+n)k+1. It then simulates M (recall M, the probabilistic turing machine which decided L with some oracle given), and accepts or rejects based on M.

It is easy to see that if x1yLʹ, then Merlin could send the correct circuit, and Arthur would always accept. On the other hand, if x1yLʹ, then Merlin ``commits'' to an oracle by sending a circuit and by the properties of L and M, Arthur accepts with only a small probability. This completes the proof.