Thursday, April 30, 2009

Lecture 28

Today we talked about barriers to proving $P \neq NP$ and other complexity-theory results: specifically, the "relativization" barrier introduced by Baker-Gill-Solovay, and the "natural proofs" barrier introduced by Razborov-Rudich.

We also talked about results that evade them. For example, the Hopcroft-Paul-Valiant Theorem, $TIME[f(n)] \neq SPACE[f(n)]$, and the Paul-Pippenger-Szemeredi-Trotter Theorem, $DTIME[n] \neq NTIME[n]$, do not relativize. Nor do they algebrize, according to the Aaronson-Wigderson notion. (At least, my opinion is that the notion of relativizing/algebrizing makes sense for these theorems, and in this sense they do not relativ/algebrize.)

As for natural proofs, we discussed how proofs by diagonalization do not seem to naturalize, and we mentioned explicit circuit lower bound which are by diagonalization; for example, Kannan's result that $\Sigma_4$ not in $SIZE(n^{1000})$.

Finally, we mentioned some theorems that neither relativize nor (seem to) naturalize: for example, Vinodchandran's Theorem that $PP$ not in $SIZE(n^{1000})$, and Santhanam's improvement of this to an explicit promise problem in $MA$ which isn't in $SIZE(n^{1000})$.

Game theory complexity and computing Nash Equilibrium

In 1944 von Neumann and Morgenstern introduced the subject of Game Theory in their book Theory of Games and Economic Behavior.  von Neumann and Morgenstern showed that there always exists a equilibrium solution to any two-person zero-sum game.   (They also showed any n-person non-zero-sum game can be trivially turned into a (n+1)-person zero-sum game by adding a player who represents the net global profit or loss.)  Nash extended this work to show that there exists a equilibrium for any (zero or non-zero-sum) n-person game.  (A Nash Equilibrium is a set of mixed strategies for the players such that each individual player has perfect knowledge of the other players' strategies and has no benefit from changing their strategy unilaterally under this knowledge.)

However, Nash's proof is not constructive in a computational sense, in that it does not imply a general algorithm for determining the equilibrium point for an arbitrary game. There is good reason for this.  We can show that there is no general algorithm for determining the equilibrium of a given game on the reals.

We must first give a definition of what we mean by a computable real.  In this analysis, I will use the definition given in Computability in Analysis and Physics by Pour-El and Richards.  We say a real number x is computable if there exists a computable sequence of rationals that converge effectively to x.  (Converges effectively means that for some sequence of rationals ${r}_k$ there is a computable function $e$:$\mathbb{N}\to\mathbb{N}$  such that for all $N$, $k \geq e(N)$ implies $|{r}_k-x|\leq {2}^{-N}$ )  A computable sequence of reals is defined similarly (There is a computable double sequence of rationals $\{{r}_nk\}$  that converges effectively to ${x}_n$ as $n,k\rightarrow \inf$ )  Finally, a computable function is defined by saying that every computable sequence of inputs ${x}_n$  yields a computable sequence of outputs $f({x}_n)$, and that the function is effectively uniformly continuous.  (Generally, given a computable function $d(n)$, for all $x,y\in \mathbb{R}$, $|x-y|\leq d(n)\implies |f(x)-f(y)|\leq {2}^{-n}$.) 

Now that we have a framework to examine computability in the reals, we can move to the question of the computability of Nash equilibrium.  Obviously, if the payoffs are given as uncomputable functions, determining an equilibrium is also uncomputable.  However, if every payoff is computable, there does exist a computable equilibrium.  The proof relies on the fact that we can generate a function that is maximized at the equilibrium of the game ($h(s)$), and we can computably determine the maximal value of a computable function.   From this we can get finer and finer approximations of the maximal input to the function (since $h$ is effectively uniformly continuous), thus generating a sequence that converges to a maximal input.

But, we can also show that not every sequence of computable games is computable.  This implies that there is no general procedure to compute Nash equilibrium (The proof of this relies on constructing a sequence of computable games that are not computable by using payoffs generated from two recursively inseparable sets and then showing a computable equilibrium sequence would separate them.)

We can do one better by showing that for a limited set of games, given a computable sequence of strategies for one player, the sequence of best response strategies for the other player is not computable.  (Motivation for this is the idea that actually playing a game involves observing the opponent's strategy over time and computing a best response.)  The proof involves showing the result for a specific, simple game.

Allowing players to make some amount of error in their responses does lead to computable strategies, however.  Consider the logistic quantal response function: ${\sigma}_{ij}(\pi)=(exp(\lambda u_{ij}(\pi))/(\sum_{k\in A_{i}}   exp(\lambda u_{ik}(\pi)))$,  where $u_{ij}(\pi)$  is the payoff to $i$ from using pure strategy $j$ given a strategy profile (list of all player's strategies) $\pi$, and $A_{i}$ is the set of actions available to $i$.  If we allow players to choose strategies with probabilities given by the logistic quantal response function for a given $\lambda$, instead of choosing a best response with certainty, then we can generate a computable "equilibrium".  Note however, that no matter how tight we make this function, we have no guarantees that the resulting strategies are anywhere near a true equilibrium.

Looking at the problem of Nash equilibrium from another viewpoint entirely, we can examine the complexity of  $\epsilon$-equilibrium.  An $\epsilon$-Nash equilibrium is one in which each player has knowledge of other player's strategies and has at most an $epsilon$ benefit from changing their strategies under this knowledge.  By taking $\epsilon=1/{2}^k$ for $k$ related to the length of the description of the given game, we can show that $r$-player $\epsilon$-Nash equilibrium is complete for the class PPAD.

PPAD is defined as (taken from Wikipedia): 

G is a (possibly exponentially large) directed graph with no isolated vertices, and with every vertex having at most one predecessor and one successor. G is specified by giving a polynomial-time computable function f(v) that returns the predecessor and successor (if they exist) of the vertex v. Given a vertex s in G with no predecessor, find a vertex t≠s with no predecessor or no successor. In other words, we want any source or sink of the directed graph other than s.  

PPAD is a subclass of TFNP, which is the set of FNP problems that are total. (A binary relation P(x,y) is in TFNP if and only if there is a deterministic polynomial time algorithm that can determine whether P(x,y) holds given both x and y, and for every x, there exists a y such that P(x,y) holds. (again, taken from Wikipedia))

The proof that  $r$-player $\epsilon$-Nash equilibrium is PPAD-complete was show in a sequence of papers that first showed that 4-or-more player games can be reduced to 4-player graphical games (graphical games are ones in which the players can be viewed as vertices on a graph, and each player's payoffs are only related to the actions of that player's direct graphical neighbors), and then showing that 4-player games are PPAD-complete by reducing another PPAD-complete problem to it, and then separate proofs were given for 3 and 2-player games.

This work places Nash equilibrium into a concrete place in the complexity hierarchy as well as expanding the interest and scope of the class PPAD.

References:
Prasad, Kislaya. "The rationality/computability trade-off in finite games." Journal of Economic Behavior & Organization 69(2009): 17-26. Print.  Link

Chen, Xi; Deng, Xiaotie "Settling the complexity of 2-player Nash equilibrium" Electronic Colloquium on Computational Complexity, Revision 1 of Report No. 140  Link

Daskalakis, Constantinos; Goldberg, Paul W.; Papadimitriou, Christos H. "The complexity of computing Nash equilibrium" Link

-Kevin Mc Inerney

Wednesday, April 29, 2009

Last lecture

Hi -- don't forget to not each lunch tomorrow; we'll bring it for the last lecture, wherein we'll also cover Natural Proofs.

Tuesday, April 28, 2009

Course evaluations

Please also do the online course evaluation: go to https://my.cmu.edu/site/main/page.academics, log in with your Andrew ID, and click on "University Course Assessment". Thanks!

(And if you didn't get a copy of the paper one for the CS department, please ask Deb for one. Thanks!)

Lecture 27

In this lecture we covered Reingold's Theorem, SL = L; i.e., undirected connectivity can be solved in log space. Here are my abridged notes on actually doing Reingold's algorithm in log-space. It's not a full proof, just a "proof by example" -- but I hope it's nevertheless clearer than a hand-wave.

For the Martin-Randall Theorem, you can consult the original Martin-Randall paper. There is a clearer proof -- but of a much more general theorem -- in a paper by Jerrum, Son, Tetali, and Vigoda. It's still not the clearest thing. If I get time I will try to write up a stripped-down, simplified version of the proof. You can also consult the Arora-Barak book for a proof of a weaker but still sufficient result: $\epsilon(G r H) \geq \Omega(\epsilon(G) \epsilon(H)^2)$. Note that it is okay to have the square on $\epsilon(H)$ here since it's an absolute constant anyway -- but it would not help if the square were on $\epsilon(G)$.

Thursday, April 23, 2009

Lecture 26

Today we covered...

Proof Complexity and the NP vs. coNP problem. The "Resolution" proof system with the resolution rule ($C \vee x$, $C \vee \overline{x}$ $\vdash$ $C \vee C'$) and the weakening rule ($C \vdash C \vee x$). Completeness of Resolution with proof sketch. Various contradictions (tautologies, in fact): Pigeonhole Principle, Tseitin Tautologies, Random 3-CNF. Treelike Resolution. Resolution width. Short Proofs Are Narrow Theorem, proved for Treelike Resolution. Ben-Sasson & Wigderson Theorem: Unsatisfiable $k$-CNFs with "expansion" require wide Resolution proofs.


By the way, the question came up as to separating Treelike Resolution from General Resolution. A very strong result was proved for this problem by Ben-Sasson, Impagliazzo, and Wigderson: There is a natural family of contradictions with $n$ variables and $O(n)$ clauses, which have Resolution refutations of length $O(n)$ but requires Treelike Resolution refutations of length $2^{\Omega(n / \log n)}$.

The contradictions here are based on "pebbling" expander graphs; more specifically, the results in an old paper of Paul, Tarjan, and Celoni.

Tuesday, April 21, 2009

Blogging

Hi all. If you have not yet written a blog post -- you're one! Your deadline is the last day of class. Please email us to confirm a topic.

Lecture 25

Today Eric gave an introduction to Property Testing. Among other things, we covered the BLR Linearity Test with the Bellare-Coppersmith-HÃ¥stad-Kiwi-Sudan proof using Fourier analysis.

Sunday, April 19, 2009

Homework 5 Correction

In Problem #4b, the everywhere you see $B_\delta(f)$ this should be $B_\delta(C)$.

Thanks to Or for pointing this out.

Updated: Another small bug-fix to Problem #4b, in the definition of $S'$. Corrected version is on the course web page.

Wednesday, April 15, 2009

Easy Witnesses

While doing the proof of IKW theorem in class, we came across the easy witness method introduced by Kabanets. The original usage of this method in Kabanets'00 paper was to show that any randomized algorithm in $\mathrm{RP}$ can be simulated by a zero-error probabilistic algorithm in expected $2^{n^\varepsilon}$-time for some $\varepsilon>0$, such that no expected sub-exponential time algorithm can refute the simulation infinitely often (io). In other words, there is no algorithm of the form $R(1^n)\in \{0,1\}^n$ in this time class such that simulation makes a mistake on input $R(1^n)$ for almost all $n$.

Before going to prove this result, we look at something simpler first, which also introduces the easy witness method:

Theorem 1: At least one of the following holds:
  1. For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^\varepsilon})$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.
  2. $\mathrm{BPP}=\mathrm{ZPP}$.
Proof: Consider the set $S$ of easy witness strings. A string $\alpha \in S$ is an easy witness iff it has a concise description (ie. it is the truth table of some circuit with low complexity). For an algorithm $A$ and input $x$ in $\mathrm{RP}$ we can test $A$ against all such strings and accept iff there is one that makes $A$ accept. The crucial insight is that if this algorithm makes frequent mistakes, then the set of random strings that makes $A$ accept have high circuit complexity. Thus if fix the input $x$, $A$ is a good hardness test for strings $r$, accepting only if $r$ has no small description. Notice that a significant fraction of strings have this property. Therefore using this as a source for Impagliazzo - Wigderson generator, we can derandomize BPP.

More formally, for given randomized algorithm $A(x,r)$ on input $x,|x|=n$ and random bits $r,|r|=m=n^a$ for some constant $a$, let $S^{\delta}_m$ be the set of truth tables of all $\lceil \log m\rceil$-variable Boolean functions with low circuit complexity (at most $m^\delta$). Hence $S^{\delta}_m$ contains strings of length $m$, and it can be enumerated in $\mathrm{DTIME}(2^{n^{2\delta}})$. Therefore checking for all $\alpha \in {S^{\delta}_m}$ if $R(x,\alpha)$ accepts takes time at most $(2^{n^\varepsilon})$ for $\delta = \varepsilon/(2a)$. Let this algorithm be $B^{\varepsilon}_{A}(x)$.

If this algorithm $B^{\varepsilon}_{A}$ works for every $A \in \mathrm{RP}$ and $\varepsilon>0$, then (1) holds and we are done. Otherwise, there exists $A$,$\varepsilon>0$ and an expected polynomial time refuter $R(1^n)\in\{0,1\}^n$ such that, for almost all $n$, $B^{\varepsilon}_{A}(R(1^{n}))$ does not accept whereas $A(R(1^{n}),r)$ accepts for some $r$. Therefore $A(R(1^{n}),r)$ can be viewed as a Boolean circuit $C^{\mathrm{hard}}(r)$ which accepts almost all $m$-bit strings and all accepted strings are truth tables of a $\log m$-variable Boolean function with the smallest circuit size at least $m^{\varepsilon/(2a)}$. Since $R(1^{n})$ halts in polynomial time with high probability, we have an algorithm in $\mathrm{ZPP}$ which can construct such circuits $C^{\mathrm{hard}}$.

Given such a circuit $C^{\mathrm{hard}}$, we can guess a string uniformly at random $\beta\in\{0,1\}^m$ which $C^{\mathrm{hard}}$ accepts. Reminding ourselves of IW result:


Impagliazzo-Wigderson Generator: Given $r\in\{0,1\}^{n^c}$, truth table of a circuit on $c \log n$-Boolean variables with circuit complexity at least $n^{\varepsilon c}$, there exists $c,d\in\mathbb{N}$ and a polynomial time computable generator $G_r(s):\{0,1\}^{d \log n} \rightarrow \{0,1\}^n$ with hardness $H(G_r)> n$.

If we use this $\beta$ to construct the generator $G_{\beta}$ of IW mapping $\{0,1\}^{d \log k}$ to $\{0,1\}^k$, we get a generator $G_{\beta}$ with hardness greater than $k$ for sufficiently large $k$. Constructing these takes expected polynomial time, and last step takes deterministic polynomial time, therefore for every $L \in \mathrm{BPP}$, we have $L \in \mathrm{ZPP}$. QED

Instead of IW generator, we can use Babai-Fortnow-Nisan-Wigderson generator, to arrive at the following result.

Theorem 2: At least one of the following holds:
  1. Any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{(\log n)^{\log\log n}})$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.
  2. $\mathrm{BPP}\subseteq \mathrm{ZPSUBEXP}$ where $\mathrm{ZPSUBEXP}$ denotes the class of languages with an expected sub exponential time algorithm.

Instead of fiddling with the generator, we can increase the power of refuter. In Theorem 1, when first condition fails, we can allow the refuter to take expected sub exponential time. After changing the circuit size and parameters appropriately, we have the following result:

Theorem 3: At least one of the following holds:
  1. For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ such that no language in $\mathrm{ZPSUBEXP}$ can refute the simulation io.
  2. $\mathrm{BPP}\subseteq \mathrm{ZPSUBEXP}$.
Noting that $\mathrm{DTIME}(2^{n^{\varepsilon}})$ and $\mathrm{ZPSUBEXP}$ are in $\mathrm{ZPTIME}(2^{n^{\varepsilon}})$, and $\mathrm{RP}\subseteq \mathrm{BPP}$, we can replace the above with an unconditional result:

Corollary 1: For every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{ZPTIME}(2^{n^{\varepsilon}})$ such that no language in $\mathrm{ZPSUBEXP}$ can refute the simulation io.

Notice that above results imply that we can get analogous results to the ones obtained in IW paper:

In IW, assuming $\mathrm{BPP}\neq \mathrm{EXP}$, deterministic subexponential time bound was shown for $\mathrm{BPP}$. Using the machinery developed above, we can answer a weaker question: What happens when $\mathrm{ZPP}\neq \mathrm{EXP}$?

Theorem 4: If $\mathrm{ZPP}\neq \mathrm{EXP}$, then any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ for all $\varepsilon>0$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.

Proof: Assume otherwise. Then by Theorem 1, $\mathrm{BPP}=\mathrm{ZPP}$. But for some $\varepsilon>0$, $\mathrm{BPP}$ is a proper subset of $\mathrm{DTIME}(2^{n^\varepsilon})$ for $\mathrm{BPP}$ refuters (by Time Hierarchy Theorem). Using IW theorem, this implies
$\mathrm{BPP} = \mathrm{EXP} = \mathrm{ZPP}$. QED

The gap theorem of IW (either no derandomization of BPP is possible, or BPP can be simulated in deterministic $2^{n^\varepsilon}$ time for every $\varepsilon>0$) can be shown for ZPP as follows:

Theorem 5: Either $\mathrm{ZPP}=\mathrm{EXP}$ or for every $\varepsilon>0$, any language in $\mathrm{RP}$ can be simulated in $\mathrm{DTIME}(2^{n^{\varepsilon}})$ for all $\varepsilon>0$ such that no language in $\mathrm{ZPP}$ can refute the simulation io.

Proof: Assume $\mathrm{ZPP}=\mathrm{EXP} = \mathrm{RP}$. By Time Hierarchy Theorem, for every $\varepsilon>0$, $\mathrm{EXP}$ is not in $\mathrm{DTIME}(2^{n^\varepsilon})$ with $\mathrm{ZPP}$ refuters. Thus at most one of the conditions holds. By Theorem 4, at least one of the conditions holds. The result follows. QED

Notice that "easy witness'' method applies even when the refuters are allowed to be non-deterministic Turing machines, and by essentially using the same proof structure, we can obtain the following result:

Theorem 6: At least one of the following holds:
  1. For every $\varepsilon>0$, any language in $\mathrm{NP}$ can be simulated in $\mathrm{DTIME}(2^{n^\varepsilon})$ such that no language in $\mathrm{NP}$ can refute the simulation io.
  2. $\mathrm{BPP}\subseteq \mathrm{NP}$.

Tuesday, April 14, 2009

THURSDAY'S CLASS CANCELED

I was informed by Deb that all classes on Thursday are canceled. So, see you on Tuesday, April 21, when Eric will tell us about Property Testing.

(This kills a lecture, which was going to be on quantum, but Matt already told us about it in his blog post, so no great loss.)

As an interesting fact, I learned that Carnival is scheduled to precisely be the "Thanksgiving weekend" of the Spring Term. That way, the Fall and Spring schedules can match exactly.

Lecture 24

In today's class we covered many of the gory details of expander graphs. I didn't quite get a chance to finish the last details, so there are the notes.

Friday, April 10, 2009

Bounds on BQP

Quantum computation presents the attractive possibility of computing some functions exponentially faster than would be possible under a classical model. The most compelling example is perhaps Shor's quantum factoring algorithm, which runs in polylog time. But how powerful is quantum computation? Getting some of the basic answers involves (surprisingly, to me anyway) getting into some counting complexity.

To begin, we'll need to establish a basic set of definitions. The fundamental unit of quantum computation is the qubit. Like classical bits, qubits have two states; however, qubits can exist in linear superpositions of these states, $|\psi\rangle = \alpha |0\rangle + \beta |1\rangle$, where $\alpha$ and $\beta$ are complex amplitudes subject to $|\alpha|^2 + |\beta|^2 = 1$. We cannot observe these amplitudes directly, however; to determine the state of this qubit, we need to take a measurement, the result of which will always be a pure basis state. In the previous example, measuring $|\psi \rangle$ will result in $|0\rangle$ with a probability of $|\alpha|^2$ and $|1\rangle$ with a probability of $|\beta|^2$.

If you have multiple qubits, they can become entangled with each other, meaning (roughly speaking) that their measurement probabilities are no longer independent. So, for example, if we have two entangled qubits, the states $|00\rangle$ and $|11\rangle$ could each have probability $1/2$, while the other two states have probability $0$. If you somehow knew that their states were entangled in this way, then measuring only one qubit would tell you the value of the other -- although there would be no way to predict the value of either qubit before taking the measurement. If we have a quantum register of $k$ qubits, then each of its $2^k$ possible states effectively have distinct complex amplitudes. The vector of all of these $2^k$ complex amplitudes is our state vector, which completely encodes the statistical behavior of measurements of the system.

So now we have quantum registers, to store our information. In order to perform computations, we need quantum gates. These gates are simply unitary linear operators acting on the state vector. In order to talk about quantum time classes in a meaningful way, however, we need to restrict the set of allowable fundamental gates. (Since the composition of two unitary operators is also a unitary operator, you could design a single gate that would carry out any desired computation!) One common circuit basis involves only two fundamental gates. The first is the Hadamard gate, which takes

$|0\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$
and
$|1\rangle \rightarrow \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle)$

The second is the Toffoli gate, which takes three bits as input, passes through the first two bits unmodified, and XORs the third bit with the product of the first two.

Note that the fact that all changes in the state happen due to unitary operators implies that the number of outputs qubits is always equal to the number of input qubits. When dealing with quantum computation, we can assume without loss of generality that the quantum register has only a single accepting state.

Definition
: BQP is the quantum analog of BPP. (Quantum computation is inherently probabilistic.) It is the class of languages decidable by polynomial-size quantum circuits that (to enforce uniformity) can be generated in polytime by a classical TM, accepting with probability at least $2/3$ for $x \in L$ and with probability less than $1/3$ for $x \notin L$. (Amplification theorems with similar results to those for BPP hold here, so any values $1/2 + \epsilon$ and $1/2 - \epsilon$ will work.)

How powerful is BQP? On the lower end, we know $BPP \subseteq BQP$: we can efficiently simulate classical circuits by quantum circuits. The Toffoli gate is universal for classical computation; if you hardwire the third input to $1$ it is simply a NAND gate. Coin flips can be simulated by Hadamard gates, since they transform pure states into equal superpositions.

What about upper bounds? Clearly $BQP \subseteq EXP$, since we can simulate any BQP machine by applying a polynomial number of linear operators to the exponential-size density matrix. It's also true, but less obvious, that $BQP \subseteq PSPACE$. We can show this by integrating over computational "paths". Recall that in a quantum computation, we apply a new small unitary operator to our state at each time step. For each accepting state $|y\rangle$ at time $t$, we can compute that state's amplitude by applying the operator $U_{t-1}$ to each predecessor state $x_0,\ldots,x_{2^n}$ and summing the results. By recursive application of this same step, we can compute the amplitude of any state anywhere in the computation at any time in polynomial space (although we do require exponential time).

But we can do better, namely $BQP \subseteq PP$.

Definition: GapP is the set of functions $f(x)$ such that, for some polynomial time NTM $M$, $f(x)$ is the difference between the number of accepting and rejecting paths in $M$. Note that this gives us an alternative definition for PP: a language $L$ is in PP if and only if there is some GapP function $f(x)$ such that when $x \in L$, $f(x) \gt 0$, and when $x \notin L$, $f(x) \leq 0$.

Fact: GapP is closed under negation, (exponential) sum, and (polynomial) product.

Fact
: If there exist two GapP functions describing the real and complex parts of the entries of a polynomial sequence of complex-valued matrices, then there also exist two GapP functions describing their product.

More precisely (statement due to Watrous): let $p(x)$ and $q(x)$ be polynomial bounded functions, and, for each $n$, $A_{n,1}, A_{n,2}, \ldots, A_{n,p(n)}$ be a sequence of complex-valued matrices of size less than $2^{q(n)}$. Then, if we have functions

$f(1^n, 1^k, x, y) = Re(A_{n,k}[x,y])$

$g(1^n, 1^k, x, y) = Im(A_{n,k}[x,y])$

in GapP, then there also exist functions $F$ and $G$ in GapP such that

$F(1^n, x, y) = Re((\prod_{i=1}^{p(n)} A_{n, i})[x,y])$

$G(1^n, x, y) = Im((\prod_{i=1}^{p(n)} A_{n, i})[x,y])$

Observe that, applying an appropriate topological ordering to our quantum circuit, we can serialize our quantum gates into a polynomial number of steps. Additionally, observe that we can take the tensor product of a gate's matrix with the identity matrix for the unaffected bits in order to come up with a complete transition matrix for each time step. We can now use this fact to show that there are functions $f$ and $g$ in GapP describing all of our quantum gates.

In order to do this, we need to take advantage of an alternate representation of our quantum register: its density matrix, $\rho$. Glossing over some details, this effectively linearizes our state and operations. With $2^k$ entries in the state vector, the corresponding density matrix is a $2^k \times 2^k$ matrix with trace 1. The $i$'th entry on the diagonal is simply the probability of finding the register in state $i$. If we then let our state be this matrix, flattened into a vector, $vec(\rho)$, the entries of the matrices describing our quantum gates fall in the set $\{-1, -1/2, 0, 1/2, 1\}$. By the fact that these matrices can be generated in polynomial time, we know that the corresponding $f$ and $g$ for each matrix (multiplied by 2, to deal with the factors of 1/2) must be in GapP. Then, by the fact noted above, there must exist $F$ and $G$ describing the entries of the transition matrix for the entire circuit (multiplied by $2^{p(x)}$).

We can assume that the circuit has only a single accepting state, so finding the acceptance probability is then simply a matter of looking up the single matrix entry that describes the transition from the starting state $|x\rangle$ to the accepting state, which consists of a single invocation of a GapP function.

To complete the proof, note that the function $a(x) = 2^{p(x)} (Pr[Q \textrm{ accepts } x] - 1/2)$ is positive when $Q$ accepts $x$, and negative otherwise. Therefore $Q$ can be simulated by a PP machine, and $BQP \subseteq PP$.

In fact, not only does PP contain BQP, but $PP^{BQP} = PP$.

It is unknown whether this inclusion is tight, but all signs seem to indicate that it is. We can define the class QMA analogously to MA, except that Arthur is a BQP machine, and Merlin sends him quantum advice. This class, which obviously contains BQP, is contained in PP (originally noted by Kitaev and Watrous). Furthermore, a theorem of Vyalyi states that if $QMA = PP$, then $PH \subseteq PP$.

Thursday, April 9, 2009

Midterm correction

Oops.  I made a mistake in the grading of question 1(e): the problem of determining whether $#L = #P$ is open.  To compensate for this error, I am adding +3 to everyone's grade on the midterm.  If you have any other questions about the graded midterm (or Homework 4), please see me during my office hours next Tuesday.  Thanks!

Details for the IKW Theorem

The proof of the IKW Theorem I gave in class was a bit sketchy; let me discuss two points about it, raised by Or and Pranjal.

1. The first point, raised by Or, was about appealing to the Impagliazzo-Wigderson Theorem (or implicitly, the Nisan-Wigderson Theorem). Normally when one thinks about that theorem one thinks of the statement, "If there is a language $L \in E$ such that $L \not \in \bigcup_c \SIZE(n^c)$, then $BPP \subseteq \bigcap_{\epsilon} DTIME(2^{n^\epsilon})$.

However, this is not what the IKW Theorem appeals to; you should think of that as a corollary of the IW/NW Theorems. The Theorems themselves are input-length-specific; e.g., for Nisan-Wigderson, the statement we're using is of the form "Let $C$ be a circuit on $n$ bits of size $n^c$, and let $f : \{0,1\}^{n^{1/10c}} \to \{0,1\}$ be a function such that every size-$n^{2c}$ circuit has correlation at most $1/n^{2c}$. Then [some explicit PRG outputting $n$ bits based on $f$'s truth table and designs] $1/10$-fools $C$."

2. The second point, raised by Pranjal, was about the uncareful (indeed, possibly incorrect) way I explained the derandomization of $MA$ that comes out of assuming $NEXP \neq EXP$. The wrong way to say it is to start with a fixed language in $MA$ and then deduce that Arthur's $n^c$-time randomized part can be replaced by some other $NTIME(2^{n^a})$-time algorithm (with advice). That is, of course, pointless, if $a \geq c$, since Arthur could just enumerate all possible random strings. Rather, the key is to show that all of $MA$ can be done by a nondeterministic (advice-taking) Arthur who uses fixed-exponential nondeterministic time.

In other words, the correct key lemma is:

$NEXP \neq \EXP \Rightarrow \exists c, MA \subseteq io-[NTIME(2^{n^c})/n]$.

This is nontrivial because the same $c$ works for all languages in $MA$, regardless of their polytime complexity. And the proof of this uses the appeal to the length-specific IW Theorem described in Point 1 above.

Lecture 23

Today we finished the proof of the IKW Theorem. We also showed fhe Kabanets-Impagliazzo Theorem: derandomizing Arithmetic Circuit Identity Testing implies $NEXP$ not in $P/poly$ or Permanent not in $AlgP/poly$ or both.

We then moved on to the definition of expander graphs, the definition of strongly explicit graph families, and saw the Lubotsky-Phillips-Sarnak and Margulis-Gabber-Galil expander families.

Tuesday, April 7, 2009

Why there isn't a simpler proof of IKW

A question raised in class today was, Why can't we prove the IKW Lemma, $NEXP \subseteq P/poly \Rightarrow NEXP = EXP$, via techniques similar to Meyer's Lemma, $EXP \subseteq P/poly \Rightarrow EXP = PSPACE$?

(Recall, btw, that IKW actually get $NEXP = MA$ and Meyer actually gets $EXP = \Sigma_2$.)

In discussing it with Or and Pranjal after class, it seems the answer is the following. In the Meyer Lemma, for a fixed $EXP$ machine, we considered the map taking the index of tableau window into the tableau window contents. That map is in $EXP$, hence in $P/poly$. So there exists a circuit $C$ of some size $n^k$ computing the map. Given x, we enumerate all circuits of that size, and accept iff one of them implicitly describes an accepting tableau; this is doable in $PSPACE$.

Why doesn't this work for $NEXP$? For $NEXP$ computations, there may be multiple accepting tableaus. Not a problem so far. Certainly the $PSPACE$ algorithm still makes sense: enumerate all circuits of some fixed poly-size, and if one of them implicitly describes an accepting tableau, accept. The question is, if $x$ is in the language, will there be a fixed poly-size circuit describing an accepting tableau?

One has to be careful. There is no longer a clear meaning for "the map that takes a tableau window index to the tableau window contents", since there are multiple valid tableaus. Perhaps that's okay; perhaps we can allow for "multivalued functions". But still, there isn't a clear meaning for "this map is computable in $NEXP$ and hence in $P/poly$". The reason is, the usual meaning of an NTM computing a function is that some branches reject, and the ones that accept output valid answers. Such a model indeed computes a multi-valued function. However since it's not a decision problem (a language), it's not clear how we can get anything out of the assumption $NEXP \subseteq P/\poly$.

One way around this, as Or suggested, would be to make everything single-valued by looking at the lexicographically least accepting tableaus. Unfortunately, as we saw on Homework 2, this lexicographically least satisfying assignments characterize the lass $P^{NP}$, not $NP$. All is not lost, though; a paper of Bourke uses such ideas to show $EXP^{NP[t(n)]} \subseteq P/poly \Rightarrow EXP^{NP[t(n)]} = MA$ (where the notation indicates that the $EXP$ machine makes $t(n)$ queries to the oracle). This is related to Buhrman and Homer's old extension of Karp-Lipton: $EXP^{NP} \subseteq P/poly \Rightarrow EXP^{NP} = \Sigma_2$.

Lecture 22

In this lecture we covered "derandomization implies circuit lower bounds" topics. In particular, we saw:

Kannan's Theorem, $\Sigma_2$ doesn't have fixed-polynomial size circuits.

The Babai-Fortnow-Lund Theorem, $EXP \subseteq P/poly \Rightarrow EXP = MA$, and its consequences: $MA$-$EXP$ does not have polynomial size circuits, derandomizing $MA$ to $NP$ implies showing $NEXP$ does not have polynomial size circuits.

Most of the Impagliazzo-Kabanets-Wigderson Theorem, $NEXP \subseteq P/poly \Rightarrow NEXP = EXP$ (and indeed $NEXP = MA$), along with Kabanets's "easy witness method". The remainder will be proved next time.

Homework 5

Homework 5 is posted, due in 2 weeks as usual. It's long! Start early.

Thursday, April 2, 2009

Lecture 21

Today we saw the completion of Sudan-Trevisan-Vadhan's proof of the Impagliazzo-Wigderson Theorem: if there is a positive $\epsilon > 0$ such that $E$ contains a language not in $SIZE(2^{\epsilon n})$, then $P = BPP$. This included Sudan's algorithm for list-decoding Reed-Solomon codes.