Saturday, January 31, 2009

Misc. Definitions

In the lectures we've skipped some technical definitions so as to not get bogged down. But these are good things to know; they'll come up again and again if you study Complexity Theory.

1. Constructibility:A function $f : \mathbb{N} \to \mathbb{N}$ is time-constructible if there is a DTM which, on input $1^n$, outputs $1^{f(n)}$ in time $O(f(n))$. It is space-constructible if outputting $1^{f(n)}$ is doable in space exactly $f(n)$. As mentioned in class, we also invariably restrict attention to time bounds that are at least $n$ and space bounds that are at least $\log n$. In this case, every non-ridiculous time bound you would ever use is time-constructible, and similarly for space. The Time and Space Hierarchy Theorems require constructibility.

2. Promise problems: This is a generalization of a "language". A promise problem is a pair $(Y,N)$ of disjoint subsets of $\{0,1\}^*$. A TM "decides" the promise problem if it accepts all strings in $Y$ and rejects all strings in $N$. Note that a promise promise is a language iff $Y \cup N = \{0,1\}^*$. The name "promise" here refers to the fact that you can think of the TM as being "promised" that the input string is always in $Y \cup N$. Generally speaking, if there is an "efficient" algorithm for checking if $x \in Y \cup N$ then the promise problem is not much different from a language (one can put the "easily decidable" strings in $\{0,1\}^* \setminus (Y \cup N)$ into either $Y$ or $N$, say).

3. Syntactic/semantic classes: This is not a 100% rigorous definition; we illustrate it via examples. A complexity class defined via time-bounded machines (TMs, NTMs, counting TMs, etc.) is said to be syntactic if there is an "effective enumeration" of the machines defining the class. For example, if $f(n)$ is a time-constructible function, the class $DTIME(f(n))$ is a syntactic class, since you can effectively enumerate the TMs running in time at most $O(f(n))$. Similarly for $NTIME(f(n))$ -- the NTMs running in time at most $O(f(n))$ are effectively enumerable. Classes like $P$, $NP$, $PSPACE$ are "syntactic". Syntactic class have Hierarchy Theorems and they also tend to have "generic complete problems". A semantic class is one which is not syntactic; generally, this means it is defined by a machine class "with a promise". The classic example is $BPP$: Here there is no way to effectively enumerate randomized TMs with the property that for each $x$ they either accept with probability at least $3/4$ or reject with probability at least $3/4$. (Indeed, checking whether an RTM has this property is undecidable.) Instead, languages in $BPP$ are defined only by machines $M$ that satisfy the promise "$M$ is a BPP machine".

4. Promise classes: For semantic classes like $RP$ and $BPP$ there is an associated "promise class" of promise problems, denoted $prRP$, $prBPP$, etc. For example, $prRP$ is the class of all promise problems $(Y,N)$ such that there is a polynomial-time RTM which accepts all strings $x \in Y$ with probability at least $1/2$ and accepts all strings $x \in N$ with probability $0$. The RTM may have any behavior on strings not in $Y \cup N$.

5. Uniformity: Again, this is not 100% well-defined, but a complexity class is said to be uniform if it is defined by one machine that operates for all input lengths. It is said to be nonuniform if there is a different machine defining it for each input lengths. Roughly, classes with TM-based definitions are "uniform", and classes with circuit-based definitions are "nonuniform".

6. Reductions: We've talked about how it is important to specify the complexity of reductions (poly-time, log-space, even $AC^0$), but there's also another distinction. A many-one (or Karp) reduction $R$ from $L$ to $L'$ is one which maps $x$ to $R(x)$ such that $x \in L \iff R(x) \in L'$. An oracle (or Cook) reduction $R$ is one which solves $L$ by using an algorithm for $L'$ as an oracle. Having a Karp reduction is "stronger" than having a Cook reduction, and we generally define completeness for complexity classes in terms of having a Karp reduction because we can: e.g., all textbook $NP$-completeness reductions are many-one. But sometimes it's better to define completeness in terms of Cook reductions; for example, as we saw in class with $\#\P$.

7. -hard: For a class $C$ we said that $L$ is $C$-complete if: (a) $L \in C$; (b) every problem in $C$ reduces to $L$. We say that $L$ is $C$-hard when (b) holds.

8. Self-reducibility: A language $L$ is said to be downward self-reducible if one can decide $x \in L$ efficiently using an oracle for $L$ on instances of size $< |x|$. This sounds funny, but it makes sense. The classic example is $SAT$: If you have a black box which can solve any $SAT$ instance of size at most $n$, you can use it to efficiently solve any instance of size at most $n+1$; you use the classic trick of setting $x_1$ to $0$ and checking satisfiability, and setting $x_1$ to $1$ and checking satisfiability. A language $L$ is said to be randomly self-reducible if one can decide $x \in L$ efficiently with high probability using an oracle for $L$ which works with high probability on randomly chosen instances. An example (as we'll see at some point in class) is the 0-1 PERMANENT problem: If you can solve randomly chosen 0-1 PERMANENT instances with high probability, you can actually solve all 0-1 PERMANENT instances with high probability.

9. Padding: Padding is the general idea of taking a language $L$ and considering the language $L' = \{ x\#1^{f(|x|)} : x \in L \}$, where $f$ is some fast-growing functions. I.e., it's the idea of "padding" all inputs with a large number of "blanks". It is used for classic upward collapse results such as "$NP = P \Rightarrow NEXP = EXP$". To see this, suppose $NP = P$. Let $L$ be a language in $NEXP$; say it's in $NTIME(2^{n^c})$. Let $L'$ be the padded version of $L$, where $f(x) = 2^{n^c}$. Now it's easy to see that $L'$ is in $NP$: just run the $NTIME(2^{n^c})$ algorithm for $L$, which is now "polynomial time" because the input length in $L'$ has been padded. Hence by assumption, $L'$ is in $P$. But then clearly $L$ is in $EXP$: to decide $x \in L$ in exponential time, just pad $x$ and then run the $P$-algorithm for $L'$.

10. Succinctness: This is the spiritual opposite of padding. For some languages $L$, there is a version $L'$ where the inputs for $L$ are succinctly described. The classic example is where $L$ is CIRCUIT-VALUE, and $L'$ is SUCCINCT-CIRCUIT-VALUE. Here, an input $E$ for $L'$ is a "succinct encoding" of an (exponentially large) circuit $C$: The input $E$ itself is a circuit, which on input $i$ "outputs the $i$th gate in $C$" -- its type (AND, OR, NOT, input, output), and the indices of the gates that feed into it. Typically, the "succinct" version of a complete language is itself complete for the "exponentially larger" class. For example, you can check that since CIRCUIT-VALUE is $P$-complete, SUCCINCT-CIRCUIT-VALUE is $EXP$-complete. And since SAT is $NP$-complete, SUCCINCT-SAT (where the input is a circuit describing an exponentially long formula) is $NEXP$-complete.

Friday, January 30, 2009

Partial Derandomization of Polynomial Identity Checking (II)

Given the blackbox model, a direct way to test whether there's a zero polynomial inside is just to try (maybe a lot of) inputs. Then if it is a zero polynomial you'll always get the right answer; but if it isn't, you need to have a reasonable bound on the number of tests you need to make. Hence the key observation that makes an efficient randomized algorithm possible is the following:

(Schwartz-Zippel) Let $P(x_1,...,x_n)\in F[x_1,...,x_n]$ be a nonzero polynomial and $d=\sum d_i$. Pick points in some set $S^n\subseteq\bar F^n$ ($\bar F$ denotes the algebraic closure of $F$, which is just a technicality, since sometimes you need to go into extensions of $F$) and feed them to the blackbox, then you get at most $d|S|^{n-1}$ zeros as output.

An inductive argument (on the number of variables) easily shows its validity. This is saying: if you choose a reasonably big $S$, say $|S|>cd$ for some constant $c$, then the possibility of getting zero as output is at most $c^{-1}$. The number of random bits used is $n\log cd$. Evidently the algorithm can be boosted to arbitrary precision at the cost of using more random bits (bigger $S$ or repetitive runs).

This is a practical enough algorithm, but can we possibly have some magic inputs to try on the blackboxes, so that zero polynomials can be deterministically detected from the output? Consider, for example, when you know the polynomial only has integer coefficients, maybe you can feed it with some numbers whose weirdness will not be affected by the integer coefficients, and the whole polynomial evaluates to zero iff all the coefficients are zero! This is the idea of Chen-Kao: use irrational numbers.

Suppose we feed $P(x_1,...,x_n)$ (which contains only integer coefficients) with an input $(\pi_1,...,\pi_n)$ satisfying $\pi_i=\sum_{j=1}^{\log (d_i+1)} \sqrt p_{ij}$ where $p_{ij}$s are arbitrary (different) prime numbers. Then, $P(x_1,...,x_n)\neq 0$ iff $P(\pi_1,...,\pi_n)\neq 0$. (Proof: In the inductive step, first evaluate $P$ on $\pi_1,...,\pi_{n-1}$, which results in a univariate polynomial $P'(x_n)$, with coefficients in the extension field $F'=F[\pi_1,...,\pi_{n-1}]$. But notice that the dimension of $F'[\pi_n]$ over $F'$ is $2^{\log(d_n+1)}$, which is bigger than the degree $d_n$, implying that $\pi_n$ can never be a root of $P'(x_n)$.)

But isn't that saying we have a deterministic algorithm already? No - you can never evaluate irrational numbers at infinite precision! We can only truncate the binary expansion of the irrationals at certain decimal positions, say $l$, and randomly choose the signs of $p_{ij}$s -the error probability then drops proportionally to $l^{-1}$ (calculation is needed and contained in Chen-Kao's original paper (Lemma2.2)). Rethink about this: the precision of the algorithm depends on $l$, which is not in the random bits! So arbitrarily precision can be achieved without changing the number of random bits (which is $\sum_{i=1}^n \log (d_i+1)$), the number of different signs we picked).

Chen-Kao's ingenious method has one drawback - it's dependent on the properties of numbers (coefficients need to be integers; prime numbers need to exist - not the case for, say, finite fields). This is where Lewin-Vadhan came in, with an abstraction of the method applicable to arbitrary field.

The idea of Lewin-Vadhan is to find a substitute notion of irrationals in arbitrary fields. Note that the underlying strategy of Chen-Kao is to find some evaluation points that are provably non-vanishable for nonzero polynomials. Lewin-Vadhan carefully showed that the following analogy works: let prime numbers be abstracted to irreducible polynomials in $F[x]$; binary expansion of the square roots be abstracted to power series (in $F[[x]]$) of the square roots of the irreducible polynomials, which are approximated by truncating modulo some $x^l$.

For the whole scheme to work, two things need to be affirmed:

1. The irreducible polynomials indeed work as prime numbers, i.e., when any nonzero polynomials $P(x_1,...,x_n)$, let $\pi_i=\sum_{j=1}^{\log (d_i+1)} \sqrt f_{ij}$ ($f_{ij}$s are now univariate polynomials) then $P(\pi_1,...,\pi_n)$ is never zero (in $F[x]$!). The proof for this goes similarly as the simpler proof above for prime numbers.

2. The irreducible polynomials that have square root expansions (this is not always the case since the constant term has to be quadratic residue) need to be easy enough to find. In fact, it is proved using Hensel's Lemma that polynomials have a power series expansion in $F[[x]]$ iff the constant term is a quadratic residue (except for finite fields of characteristic 2, which are handled independently). Also, the number of irreducible polynomials in $F[x]$ of degree less than $n$ is at least $(|F|^n-1)/n$. Please refer to the original paper for the full proofs (actually the extended version of it).

Finally, the approximation using power series needs to give the desirable error probability. For any given error $\epsilon$, the power series can be truncated at $x^l$ where $l=0.5 \epsilon^{-1} d \max (\deg (f_{ij}))$. This is technically more complicated than just probability calculation as for the prime number case. But, (very "luckily", which is a word used multiple times in the paper), everything works out. The random bits are, still, only used for choosing signs of $\sqrt f_{ij}$, and the number of them is $\sum_i\log (d_i+1)$.

In a sense, the problem is solved after Lewin-Vadhan's construction. It is provable that under the given model (i.e., only $n$ and $d_i$ are known as parameters), the number of random bits used is already optimal - since any deterministic algorithm requests at least $\prod_i (d_i+1)$ queries to decide the polynomial (if the number of queries is smaller than $\prod_i (d_i+1)$ which is the number of possible monomials, you'll get polynomials indistinguishable from the zero polynomial), any random algorithm necessarily needs $r=(1-O(1))\sum_i \log(d_i+1)$ bits (a trivial derandomization is to try out all the $2^r$ possibilities which shouldn't be smaller than the deterministic queries needed).

This should be a good success story in randomized algorithm design, had there not been the tragic death of Daniel Lewin on 9-11, 2001.

Thursday, January 29, 2009

Lecture 6

In this class we finished the proof of the Approximate Counting/Sampling and the Bshouty-Cleve-Gavalda-Kannan-Tamon Theorem. We also introduced #P.

Partial Derandomization of Polynomial Identity Checking (I)

(To Ryan and Venkat: my post is getting too long so I'm breaking it into two (hopefully not three) parts and the second part will come out tomorrow...)

Checking multivariate polynomial identities is a problem central to algorithm design and complexity theory.

I didn't inline a formal definition in the previous sentence, because there's a catch in defining the problem. A polynomial $P(x_1,...,x_n)$ (say, in $F[x_1,...,x_n]$ for some field $F$) can be viewed as, at least, two things: syntactically, it is just a formal expression recursively built up from variables, coefficients from $F$, arithmetic operators and parentheses; semantically, it is a function from $F^n$ to $F$. Correspondingly, when we say a polynomial is identical to zero, we can have two different meanings:

1. It is indeed the zero element in the ring $F[x_1,...,x_n]$. Namely, when written in sparse form (that is, when parentheses are no longer needed), all coefficients turn out to be zero.
2. It is a constant function that maps all the points in $F^n$ to $0\in F$.

The two definitions coincide only over an infinite field $F$ (reason: a syntactically non-zero polynomial can not vanish on all the points in $F^n$ (but note that it can definitely vanish on an infinite number of points, some online notes are wrong about this), since you can fix values to $n-1$ variables and get a univariate polynomial which can only have finitely many roots (less or equal than its degree). For a finite field of size $q$, the polynomial $(x^q-x)\in F[x]$ vanishes on every point in $F$ (reason: the nonzero elements are in a cyclic group under multiplication, obeying Lagrange's theorem).

We fix definition #1. If you need, two reasons can be provided:
1. For definition #2, the problem is coNP-complete over finite fields. It's in coNP since you can decide its complement language by checking certificates for their nonzero-ness. It's complete since Boolean tautologies are just polynomials that are always evaluated to 1 over the finite field of size 2.
2. Applications: Biparitite matching, IP=PSPACE, etc.

Now the problem is clear: given a multivariate polynomial $P(x_1,...,x_n)\in F[x_1,...,x_n]$, we would like to decide whether $P(x_1,...,x_n)=0\in F[x_1,...,x_n]$. Next we need to decide the description format of the input polynomials. In the trivial case, polynomials can be given in sparse form and you just take a look at the coefficients. Of course that shouldn't be what we are concerned about here, since expanding polynomials easily takes exponential time. Rather, we take the black-box model of the polynomials, which is just something that efficiently evaluates the polynomial given any point in $F^n$. The only parameters of the polynomial that we have access are: the number of variables $n$, and the highest degree on each variable $d_i, i\in \{1,...,n\}$. Note that if we are allowed to know other parameters, the problem can be different.

So (Too) much for the preparation. The problem was given a randomized polynomial time algorithm first by Schwartz-Zippel. Twenty years after that, Chen-Kao gave a partial derandomization of Schwartz-Zippel's algorithm (i.e., reduced the number of random bits needed in the algorithm) for polynomials with integer coefficients. Later, Lewin-Vadhan, which will be the focus of (the coming second part of) my post, generalized Chen-Kao's method to the problem in its full generality, and proved the optimality of the number of random bits used under the given model (blackbox, same parameters). As mentioned in class, whether the algorithm can be fully derandomized is open.

Wednesday, January 28, 2009

Trivial lemma => powerful circuit lower bounds

Remember the very first lemma we used in the proof of the Valiant-Vazirani Theorem? Though simple, it's actually extremely useful. Here is an equivalent way to state it:

Lemma: There is a degree-2 "randomized $F_2$-polynomial" for the OR function. Precisely, the degree-2 polynomial

$p(x,r) = \sum_{i=1}^n r_i x_i$ mod 2

computes the function OR($x$) with one-sided error $1/2$ when its "random bits" $r$ are chosen uniformly.

It's an easy exercise to show from this that there is a degree-$O(\log n)$ "randomized $F_2$-polynomial" which computes the OR function with one-sided error $1/n^{100}$. Of course, there is similarly one for AND. By combining these, you can get a polylog$(n)$-degree randomized polynomial for any "$AC^0$" circuit.

Definition: A circuit is in "$AC^0$" if it is of polynomial size and constant depth. Here we allow AND and OR gates of unbounded fan-in (otherwise you couldn't even compute the AND of all bits in constant depth!).

But for some basic functions -- e.g., the function

$f(x_1, ... , x_n) = \sum_{i=1}^n x_i$ mod 3

-- it's not too hard to show that they can't have a polylog$(n)$-degree randomized polynomial. Hence we get a rare circuit lower bound: this $f$ cannot be computed in $AC^0$.

We will probably do this more explicitly later in the course. This idea is also used crucially in Braverman's remarkable paper, discussed in an earlier blog post.

Tuesday, January 27, 2009

Lecture 5

The topics in Lecture 5 were: Why having an algorithm for SAT is different from having an oracle for SAT; the Valiant-Vazirani theorem ("Unique-SAT $\in P \Rightarrow$ SAT $\in RP$"); pairwise-independent hash families; most of the Approximate Counting Theorem.

Monday, January 26, 2009

If pigs could whistle, then horses could fly

Sometimes a result in Complexity Theory is described (or disparaged?) with the phrase, "If pigs could whistle, then horses could fly". (I can't figure out how this got started; the earliest online example I could find was from Dan Spielman's lecture notes but I'm pretty sure I heard it well before this. Can someone antedate this reference?)

The meaning is that the result is of the form "(something probably false) implies (something probably false)". An example is the Karp-Lipton Theorem:

$NP \subseteq P/poly \Rightarrow$ the Hierarchy collapses.

If you really believe "$NP \subseteq P/poly$" is false, then in some sense the statement is vacuous. If you somehow intuitively feel that "the Hierarchy collapses" is even more probably false than "$NP \subseteq P/poly$" then the statement has value; its contrapositive is that "assuming the Hierarchy doesn't collapse" (which you strongly believe), $NP$ does not have polynomial-size circuits (which maybe you believed less)?

In this particular case, I don't know what your intuition is, but I suppose mine is that I feel $NP \subseteq P/poly$ is more false-seeming than that the Hierarchy collapse. So does the Karp-Lipton tell me anything?! Or is it just, "If pigs could whistle, then horses could fly"?

In this case, the theorem is very meaningful -- but for another reason. As Scott Aaronson once wrote in a blog comment on this subject, "It gives one of the few known bridges between [TM-based complexity classes] and [circuits] -- and thereby lets us prove interesting circuit lower bounds." By this he means the fact that $\Sigma_2$ does not have polynomial-size circuits -- Problem 4c on your homework!

Friday, January 23, 2009

The many characterizations of NP

When I first heard of nondeterministic Turing machines and the complexity class NP, I thought the idea was somewhat contrived. Alan Turing’s original work laid out a computational model which Turing believed captured all of the important artifacts of a person performing a computation. If you want to take a look at Turing’s paper (it’s a good read!), you can find a copy of it here: Today, we are taught to think of Turing machines like specific computer programs or computers (universal TMs). Most programmers would be aghast at the thought of a computer being able to guess inputs to functions or take different courses of action in the middle of a computation. It turns out that despite some of the strangeness of thinking about nondeterministic machines, the complexity class NP has many interesting equivalent definitions without the use of nondeterminism. The goal of this blog post is to explore these various equivalences in more detail.

We first saw robustness in our definitions when we showed that polytime multi-tape TMs have equivalent polytime single-tape TMs. Thus, the class P is robust in the sense that it doesn’t matter whether we use a single or multi-tape TM to solve problems in P, EXP, etc. In a somewhat similar looking result, Savatch’s theorem relates the amount of space required by a deterministic simulation of a nondeterministic machine as follows: $NSPACE(f(n)) \subseteq DSPACE(f^2(n))$. Hence, the class PSPACE is robust to whether we use deterministic or nondeterministic machines. In the area of low space complexity, we still don’t know whether L = NL. However, Immerman and Szelepcsenyi independently showed that NL = co-NL. We’ll return back to this later when we mention descriptive complexity theory.

Of course, one of the most important question about nondeterministic TMs is whether P = NP. While it’s straightforward to go about simulating a nondeterministic TM using only a quadratic space blowup, it’s not at all clear whether one can avoid doing an exhaustive search of computation paths in the simulation. Steve Cook has an excellent overview of the P vs. NP problem on the Clay Mathematics website that you should check out here.Overall, the study of complexity started early on when von Neumann, Cobham and Edmonds took interest in problems which have polytime solutions. Over time, there have been proofs that certain theories are difficult to decide (look at Presberger arithmetic or type-checking in ML for example). Whether we should consider polynomial run time feasible is questionable (consider an O(n^100) algorithm and doubling the input size). Alas, this argument seems like one best left to people actually implementing systems, where constant factors matter. :-)

The P = NP is important from a theoretical and practical viewpoint. It deserves a better response than "not unless P = 0 or N = 1." I want to argue why NP is 'natural' in the sense that even without strange nondeterministic machines we still define NP in many different ways. So with that, let's take a look at a few different characterizations of NP!

Characterization 1: You only need to easily verify a 'solution'.

As we saw in class, NP is exactly those languages for which there’s a polynomial time ‘verifier’ V for the language. We showed that for a language L in NP there’s some polytime TM V(.,.) and constant c such that V verifies x’s membership in L given the existence of some certificate string y, where |y| < |x|^c. If you don't have such a certificate, then x isn't in L. The most basic and intuitive example we saw was SAT: given a propositional formula and truth assignment, you can easily check whether the assignment satisfies the formula. If there is no such satisfying assignment, then the formula is not in SAT. Sometimes it’s much more tangible to work with this characterization of NP- come up with an appropriate notion of a certificate and simply write a reasonably fast program which checks certificate validity. Furthermore, this definition of NP doesn’t require the machinery of nondeterministic TMs. A consequence of P = NP and this characterization of NP would be that finding solutions (certificates) is no harder (up to polynomial factors) than verifying solutions. Alternatively, if P != NP, then there is some language for which it’s hard to find solutions. As SAT is NP-complete, SAT would need to be one such difficult language and we couldn't quickly find satisfying assignments for Boolean formulas.

Characterization 2: Probabilistic Checkable Proofs

For the sake of this blog post, I'll provide a high-level view of PCPs without going into details about the formalities. Arora’s and Barak’s complexity book provide nice coverage of interactive proof systems and PCPs. In the case of SAT we can determine whether a formula A is in SAT by verifying some satisfying assignment to the variables. An incredible result known as the PCP Theorem tells us that each NP languages has a probabilistic certificate verifier which only looks at a constant number of bits in the certificate w, and is allowed O(log(n)) random bits to perform the check. Like we did in the case of BPP and RP, we'll look at a probability distribution over random input strings r. In the case that the certificate is valid, the verifier must always accept (x,w,r). In the case that the certificate is actually incorrect, the probability that the verifier incorrectly accepts (x,w,r) must be less than 1 /2.

What implications does this have in practice? One example is that you can be arbitrarily certain of the correctness of a supposed satisfying assignment A to a 3-CNF formula Phi while checking only a constant number of variable assignments and using a small number of random bits.M athematicians trying to verify that a long (but still polynomial length) proof of a theorem is correct can do so with great certainty without reading the whole thing! Since theorems with polynomial length proofs are a language in NP, reviewers would only need to spot check the proof in order to confident of the proof’s correctness. That sure cuts down on the amount of work a reviewer has to perform, but surely doesn’t satisfy a mathematician’s desire for 100% certainty that a proof holds. The PCP theorem provides an interesting probabilistic definition for NP.

Characterization 3: Descriptive Complexity Theory and doing away with machines

Up to this point we’ve focused on looking at NP languages in terms of computational models. So far we’ve seen two verifier-based approaches, but what about a view of NP which makes no reference to Turing at all? Immerman’s “Descriptive Complexity: a Logician’s Approach to Computation” provides a nice introduction to the topic of descriptive complexity. In general, NP languages characterize some property of strings. For example, the language SAT contains those formulas which satisfy the property “This formula is satisfiable.” Descriptive complexity theory attempts to characterize the hardness of defining these properties in a logical framework.

Immerman says that “we view inputs as finite logical structures, e.g., a binary string w” which can be encoded as an element in structure that encodes all n-bit strings, functions to test whether a particular bit i is set, and an order relation over all of the binary strings. We've done this sort of thing before when arguing that we only need to consider TMs over the alphabet {0,1}. Let’s look at how we can express the language of 3-colorable graphs in this context using these structures, unary ‘color’ relations, a binary edge relation, and existential quantification. The structure contains an encoding of the graph vertices and edges, and the relationship Edge(.,.) determines whether two vertices are adjacent or not. A graph is 3-colorable iff there’s an assignment of one of three colors to every vertex and no two adjacent vertices are colored the same. Logically, this is expressed as

Exists unary relations R, G, and B such that for all x [ (R(x) or G(x) or B(x)) and [for all y Edge(x,y) implies {not (R(x) and R(y)) and not (G(x) and G(y)) and not (B(x) and B(y)) } ]]

The unary relationships assign colors to all of the vertices and the second part makes sure that no two adjacent vertices are colored the same. This formula is an example of a second order existential formula; a 2nd order existential formula begins with a second order existential quantification and the rest of the formula is a first order statement. These formulas look like “Exists relations R, S, T, … such that (first order formula over the universe).” The first result in descriptive complexity theory happened when Ronald Fagin showed that a set of structures (collections of binary strings) Q is in NP if and only if Q is the set of structures which satisfy some second order existential formula. Notice that this definition makes no mention of Turing machines *or* time! Earlier I made a comment about NL = co-NL, and I will finish commenting on it here. Immerman proves that NL is equivalent to “the set of problems describable in first-order logic with the addition of a transitive closure operation.” After playing around with the logic and properties of transitive closure operations, Immerman is able to show that for s(n) >= log(n), NSPACE[s(n)] is closed under complement. Describing complexity classes in terms of logical formulas provides a different perspective on computation and introduces an interesting set of tools for approaching various problems in complexity theory.

Hopefully this post gives you some perspective about all kinds of interesting characterizations of NP. Based on some of the logic I've studied, I'm interested in looking more at descriptive complexity theory. If you're interested in exploring this approach to complexity theory in more detail, please get in touch with me.

Note to future bloggers: DO NOT write your post in MS Word and copy it into the rich text edit box. It ends up copying over a bunch of wacky HTML that you have to edit out by hand in order to get your post to work... :-(

Polylog-wise independence fools constant-depth circuits

Well, the big buzz around the blogosphere, occasioned by an anonymous comment on Luca's blog Wednesday night is a new preprint by the mighty Mark Braverman, currently a postdoc at Microsoft Research, soon a faculty member at the University of Toronto.

Mark paper proves a conjecture of Linial and Nisan from 1990; namely, that no $n$-input circuit with constant depth and polynomial size can distinguish truly random inputs from inputs that are $k$-wise independent for $k = polylog(n)$.

The preprint is discussed in some detail on Scott's blog and also on Lance/Bill's blog.

There is a good chance we'll have the tools to cover this paper toward the end of the course. :)

Thursday, January 22, 2009

Lecture 4

The topics covered in Lecture 4 were: Adleman's Theorem, $BPP \in P/poly$; Poly-time hierarchy; Min-Circuit; Infinite Hierarchy Assumption, Collapse Lemma; $PSPACE =$ Alternating Poly Time statement; Oracles, $\Sigma_2 = NP^{NP}$, etc; Sipser-Gacs-Lautemann Theorem, $BPP \in \Sigma_2$; Karp-Lipton(-Sipser) Theorem, $NP \subseteq P/poly \Rightarrow PH = \Sigma_2$; Valiant-Vazirani Theorem statement, UniqueSat $\in P \Rightarrow NP = RP$.

Tuesday, January 20, 2009

Lecture 3

Lecture 3 covered the following topics: NP vs. coNP and the theme of proof complexity; nondeterministic time hierarchy via lazy diagonalization; Ladner's Theorem statement; Randomized TMs, BPP, RP, ZPP; problems in BPP but not known to be in P (Berlekamp's algorithm vs. factoring cubics mod p) and primality; lack of hierarchy and complete problems for BPP; polynomial identity testing; error amplification for BPP (and statement about randomness reduction); poly-size circuit families; Adleman's Theorem BPP in P/poly.

Monday, January 19, 2009


Computer scientists seem to be a blogging bunch, and there are several great blogs out there for complexity theory.

The granddaddy of them all is certainly Lance Fortnow's Computational Complexity Weblog. This blog was amazing for starting as early as 2002 and for posting extremely regularly. I can't recommend enough Lance's amazing output on complexity; try reading the archives here. These days the blog is cowritten by Lance and Bill Gasarch and the output has slowed -- but one a day from the archives makes for great reading.

Two other great, long-running blogs concerned mostly with complexity are those by Scott Aaronson ("Shtetl-Optimized") and Luca Trevisan ("in theory").

Of course it's hard to draw the line between "complexity theory" and "theory of computer science", so rather than try to decide if, say, James Lee's "tcs math" is "complexity" enough to be listed, I thought I'd just point you to Arvind Narayanan's comprehensive Theory of Computing Blog Aggregator.

And I would be remiss if I didn't mention what is probably the most amazing mathy blog of all, Terry Tao's extraordinarily voluminous "What's new".

Happy reading.

Thursday, January 15, 2009

Tentative syllabus

Here is a very rough guideline for the list of topics we might cover in the course. This is almost certainly an ambitious plan and is subject to change based on time constraints. We will probably only be able to discuss a subset of these topics, and may not cover the topics necessarily in the order listed.


Introducton: the big questions, Turing machines, Basic time and space
classes (P,NP,L,PSPACE,NL), Universal Turing machines, Hierarchy
theorems, Nondeterminism, NP-completeness and Cook-Levin theorem,

Randomized complexity classes, Polynomial identity testing, Polynomial
time hierarchy, Non-uniform (circuit) classes, Karp-Lipton theorem.

Complexity of Unique-SAT, Approximate counting in the
hierarchy, Exact learning of circuits with oracles.

Space complexity: PSPACE and NL-completeness, Savitch's theorem, NL =
coNL, Relationship to parallel computation and NC circuit classes.

Counting classes, #P-completeness, The power of counting (Toda's

The power of interaction: Graph nonisomorphism protocoal, Interactive
proofs for #SAT, IP=PSPACE, coNP unlikely to have short interactive

Lower bounds and concrete complexity:
Parity doesn't have small depth circuits.
Some frontiers in circuit complexity:
- ACC_0 and multiparty communication complexity
- Log depth and rigidity
Time space trade-offs: Lower bounds for SAT;
Non-uniform model: Branching programs. Barrington's theorem.

- Nisan's generator for low space machines
- Nisan-Wigderson: hardness vs randomness
- Hardness amplification and average-case hardness

Some vignettes:
- Cryptography in constant parallel time
- Natural proofs: a barrier to proving stronger circuit lower bounds
- Derandomization needs circuit lower bounds
- Quantum computing

Lecture 2

The topics covered in Lecture 2 were Universal Turing Machines, the Time and Space Hierarchy Theorems, Nondeterminism and NP as poly-time verification, poly-time reductions and NP-completeness, the Cook-Levin Theorem and Circuit-Sat being NP-complete, Circuit-Val is P-complete under log-space reductions, coNP.

Tuesday, January 13, 2009

Lecture 1

The topics covered in Lecture 1 were: the "big questions" like "P ?= NP", "P ?= PSPACE", "P ?= LOGSPACE", "SC ?= NL", "P ?= NC", "EXP ?$\subseteq$ P/poly", and "P ?= BPP"; search vs. decision problems and languages; the multitape Turing Machine model; the Extended Church-Turing Thesis; Linear Speedup Theorem; and the classes TIME($f(n)$), SPACE($f(n)$), P, PSPACE, and L (aka LOGSPACE).

If you need a refresher on basic stuff such as the Turing Machine Model (the details of which are as finicky as they are ultimately boring) you might look at Chapter 1 of the Arora-Barak book.

Monday, January 12, 2009

Homework Template

Hello everyone, and welcome to the Complexity class!  

I just wanted to let you know that I have added a sample homework solution to the class website.  The source files for the homework can be found here (.tex), here (.bib), and here (.sty).  Downloading all three files into a common directory and compiling with Latex, you should obtain this output.  

For the homeworks, you should only have to rename and modify the "sample" files as needed.  If you have any problems with the files or if you have any other questions, please don't hesitate to ask me.

History of Complexity

For an early blog post, I was planning on reading a little about the history of complexity, checking out the original sources, and writing a little about "who first thought of measuring running time in terms of input string length?" and "what were Cobham and Edmonds thinking when they suggested P as a good class to study?"

But after a bunch of research I found this article by Stephen Cook on the very subject, on the occasion of him winning the Turing Award. And it seemed hard to imagine how I could write anything more authentic or instructive.

So I encourage you to read the article. But as you do so, let me add a few random comments that spring to my mind. Also, please post any questions or comments that you might have...

. Cook's article was written five years before the discovery of the famous 1956 letter from Gödel to von Neumann -- in which Gödel quite explicitly asks whether NP could be in P. Bear in mind that the notions of P and NP didn't even exist in 1956!

. I've never heard of Cook's problem about printing out $\sqrt{2}$ at a linear rate -- anyone got a reference for it? Is it still open?

. As Cook himself notes, much of the key development of computational complexity took place among the graduate students at Princeton in the early-to-mid 1960's. I suppose this shouldn't be too surprising, given that it was at Princeton and the Institute for Advanced Studies in the '30s and '40s that you had all the folks working on foundations of mathematics but with a "computer science" slant (Gödel, von Neumann, Church, Turing), plus other folks like Kuhn, Tucker, and Minsky. This is the gang that begat the complexity-theory founders Cook mentions -- Rabin, Stearns, Bennett, and Ritchie, as well as Aho, Hopcroft and Ullman, and CMU's own Turing Award winners Manuel Blum and Dana Scott. The only original complexity theory folks from this era I could find not out of Princeton or Princetonians were Hartmanis (from Caltech) and the very mysterious Alan Cobham, who appears to have no PhD and to hail from parts unknown. Anyone got any information on him?

. These are all men, by the way; of course, Princeton didn't even have women undergraduates until the end of the '60s. Harvard, where Cook was, seems to have been the only non-backwards place of the time, producing several female complexity-theory folks, most notably Greibach.

. The "right" definition of how to define a RAM (random access machine) is not so controversial today as I guess it was when Cook was writing. I think everyone agrees that most sensible definitions are okay and about equal, up to log factors. I think the "random access Turing Machine" (see here) is a reasonable standard.

. Cook writes "The identification of P with the tractable (or feasible) problems has been generally accepted in the field since the early 1970's." I'm not completely sure about this today. Certainly everyone agrees that P is the most reasonable, natural, and best class to study, but there is a great deal of interest these days in linear, quasilinear (linear times log factors), and even sublinear time algorithms. In fact, it's not so unreasonable to argue that an algorithm is truly efficient in practice only if it is quasilinear time. Perhaps more on this in a future post...

. Regarding Cook's list of the most interesting and important problems in P...

- 35 years after the Schönhage-Strassen algorithm Cook mentions, a faster multiplication algorithm was found. It runs in time $n \log n 2^{O(\log^* n)}$ (!!) and was given in an award-winning 2007 paper of Fürer over at Penn State.

- Coppersmith and Winograd soon thereafter improved the matrix multiplication time to roughly $n^{2.38}$. This has stood for 20 years.

- Primes is in P, thanks to Agrawal, Kayal, and Saxena in 2004.

- Linear Programming being in P has subsequently proven to be of enormous importance.

- To be honest, there aren't so many additional notable polynomial-time algorithms discovered since then. I might point to the Convex Programming work due to Grötschel, Lovász and Schrijver (1981), the theory of Markov Chain Monte Carlo due to Jerrum-Sinclair and Dyer-Frieze-Kannan (1988), and recognizing minor-closed graph families due to Robinson and Seymour (1983--2004). Any other candidates?

. Cook devotes a lot more space to #P-completeness than NP-completeness (probably out of modesty), but the latter continues to be much more important and influential, though we will certainly study the former in our class.

. Similarly, the study of parallel complexity and the class NC has not proven quite so influential as it looked in the '80s, but we will certainly study it too.

. Re lower bounds: "Structured lower bounds" we're pretty good at. But genuine lower bounds? We still have absolutely no clue; people feel even less enthusiastic about the prospect of proving good time lower bounds than they did when Cook wrote. Cook's speculation about separating P from L at the end of the article looks overly optimistic today.

. That said, and on the Section 4.3 topic of time-space product lower bounds, there has been a lot of good work done recently in this area, especially by recent CMU complexity grad Ryan Williams. We plan to cover some of this in our course.