Saturday, February 28, 2009
Recognizing/printing real numbers in real time
In the first blog post for the course, on the history of complexity, I saw mention of the question of whether the number could be printed out by a Turing Machine at a rate of time per digit. I had never heard of the problem before. Apparently one can ask similar questions about recognizing in "real time" whether a number is, say, ; for more, see this post on Dick Lipton's blog.
Thursday, February 26, 2009
Lecture 13
Today we covered: Parity has unbounded fan-in circuits of depth- size-. Hastad's Theorem statement: This is also a lower bound for constant . Proof in the case (CNF, DNF). Razborov-Smolensky proof with rather than . Its main tool: For any , , there is a probability distribution on -variable polynomials of degree such that for every -bit input we have when is chosen from the distribution. Expanding functions in the Fourier basis.
Wednesday, February 25, 2009
Homework 2 wrapup
Tuesday, February 24, 2009
Lecture 12
Today we covered: Definitions of and ; and unchanged if perfect completeness required (statement); ; for all constant ; indeed, for any ; ; Boppana-Hastad-Zachos Theorem, ; Graph-Non-Isomorphism is not -complete unless the hierarchy collapses to ; Goldwasser-Sipser Theorem, -round private-coin interactive proofs are doable with -round public-coin interactive proofs, hence is unchanged if public coins are required; set size lower bound protocol in public coins .
Solutions to Homework 2
I will be returning Homework 2 in class today. The class average on this homework is 79%. Solutions for the homework will once again be available upon request; please email me if you would like a copy.
Saturday, February 21, 2009
Derandomization of space classes
Understanding the exact computational power of randomness is an intriguing and very challenging problem in computer science. Although we are far away from an answer, a lot of nice results have emerged in the last two decades, particularly in the area of space bounded computations. In this post, we will look at a couple of such results. We will restrict our attention to randomized log-space machines, or the complexity class BPL, but the results we mentioned generalize to BSPACE() for any space constructible function . Recall that a BPL machine M
for a language L is a randomized machine running in log-space and satisfying the following conditions:
L Pr[M accepts ]
L Pr[M accepts ]
Also we know that, the computation of such a machine on input can be seen as a random walk on a graph , called the ``configuration graph''. Each node in this graph represents a configuration of the machine and each edge represents a valid transition. Let us denote by , the total number of nodes in this graph and let us assume that that the graph has a unique start state and unique accept and reject states and . Since, the machine runs in log-space we know that is polynomial in the length of the input. We can also associate, with this graph, a matrix Q called the ``transition matrix''. The entry Q[i,j] in this matrix contains the probability that the machine M goes from configuration to configuration in one step. This way of looking at a BPL machine is very convenient since now simulating the machine is equivalent to computing the step transition probability [ , ], for any . If this probability is more than we accept. If this probability is less than , we reject. Savitch's idea of repeated matrix squaring computes this probability exactly giving us the following derandomization result . We would also be interested in the running time used by these simulations, hence we will rewrite the above result as BPL DTISP(,).
Further improvements to this result are all based on a very nice ``duality'' between hardness and randomness. This relationship was made concrete in a paper by Noam Nisan and Avi Wigderson. The high level idea is as follows: Suppose we have pseudo-random generator which takes as input a
short truly random seed and can produce a very long sequence of bits which look random to any probabilistic machine. We can then use these pseudo-random bits instead of truly random bits to simulate the machine thereby reducing the amount of randomness used. This in turn has the potential to lead to better deterministic simulations. Unfortunately, pseudo-random generators are known to exist only under the assumption that certain cryptographic objects called one way functions exist. However, it is possible to construct specialized unconditional pseudo-random
generators which can fool a particular class of machines C. Here is one way to do it, choose a class of functions, one for each input length, which is hard for the class C. In other words for every , for every machine M C, we must have |[M() = f()] | , where is uniform distribution over and is very small(lets say ). What this means is that any machine in C can not do much better then random guessing when asked to predict the values of the hard functions. If we have such a class of functions, then here is a very simple pseudo-random generator which stretches its seed by 1 bit: . Notice, that although we are only appending at the end, no machine in C can distinguish that from a truly random bit with significant probability. This was roughly the idea behind the pseudo-random generator proposed by Nisan and Wigderson.
Later, in a breakthrough result, Noam Nisan gave a pseudo-random generator which can fool all probabilistic log-space machines and uses only bits of randomness. Notice, that this is an exponential improvement since a BPL machine can potentially use poly() random bits, where is the length of the input. Virtually, every result in derandomizing space complexity classes has used some ideas from Nisan's paper. Before stating Nisan's result some definitions:
Definition 1:
For a vector , we define the norm of to be |||| = . For a matrix over , we define the norm of M, ||M||, to be the maximum over all the rows M[,.] of ||M[,.]||.
Definition 2:
If M and N are square matrices of the same dimension and is a positive real number, we say that M approximates N with accuracy a if ||M N|| .
Recall that simulating a BPL machine is equivalent to calculating the matrix , for some . Hence, we now state and view Nisan's result as as randomized algorithm for approximating this matrix. We will call Nisan's algorithm PRS which stands for pseudorandom repeated squaring.
Theorem:
Let be an integer. Given as input a transition matrix Q and an integer , the PRS algorithm takes a random string of size as input, runs in space and computes another matrix , such that approximates with accuracy and error probability .
Notice, that it is enough to approximate the entries in the matrix , since we only want to detect a significant gap in the acceptance probability of when L and the probability of , when L. For simulating BPL, we need to choose such that . Choosing , gives us the desired log-space simulation which uses bits of randomness. A straightforward derandomization of this algorithm, i.e. enumerating through all the possible choices of the random seed, will not help us improve on the bound given by Savitch's theorem(WHY?). But, in a later paper Nisan observed that most of the bits in the random seed can actually be calculated deterministically, reducing the randomness used by the algorithm to . Derandomizing this new algorithm gives a deterministic simulation of BPL which uses space and time! Hence, Nisan was able to show that BPL DTISP(,). In other words, Nisan showed that BPL (``Steve's'' class). In a later work Nisan, Endre Szemerdi, and Wigderson, in 1992, showed that undirected s-t connectivity(USTCONN) is in . Of course, from Omar Reingold's result in 2004, we now know that USTCONN L. But at that point the USTCONN was a major indication that may be deterministic simulation of BPL could be done in . This was achieved by Michael Saks and Shiyu Zhou and will be the last result covered in this post.
The main idea behind Saks and Zhou's result is to somehow balance the space usage and the randomness usage of the PRS algorithm. One way to do this is to try and combine Savitch's theorem and Nisan's PRS algorithm, i.e. try to do repeated matrix squaring but at each level of recursion, instead of calculating the matrix exactly, approximate it using the PRS algorithm. Lets see how it works in detail.
Consider the PRS algorithm again. Lets divide the input into two parts and such that . We can calculate as a sequence matrices ,,,, such that = , = and = . At each of the levels of the recursion we can use the PRS algorithm
to approximately calculate the matrices. We are going to use space at each level for a total of space. Also we are going to use bits of randomness at each level for a total of bits of randomness. At present this does not seem very impressive, but what Saks and Zhou show is that we can actually use the same seed for the PRS algorithm at each level of recursion. This reduces the randomness complexity to . The reason why we can do that is the following: The analysis of the PRS algorithm shows that most of the seeds are good for a given fixed matrix. Hence, by union bound we must be able to show that most of the seeds are also good for a fixed set of matrices. If we can do that then we can set , and , to get an algorithm with space complexity and randomness complexity . Then, a straightforward derandomization will give us the desired result that BPL . But there is a catch.
Notice that we cannot really apply the union bound argument to the matrices produced by the PRS algorithm since each of them depends on the seed used in the previous level of recursion. Saks and Zhou get around this problem by introducing the idea of random perturbations. The main observation is that we can apply the union bound argument to the sequence of matrices if they are not approximately calculated but exactly calculated as in Savitch's theorem. Now, the PRS algorithm guarantees that the matrices will be very close to the exact ones. Hence, if we just truncate the values in the matrices ignoring the lower order bits, we should be done. Except that we might have some bad roundings at boundary cases. Hence, what Saks and Zhou do is that they reduce the value of each entry in the matrix at each level by some random quantity and then truncate the values. In this way, the new matrices are independent of the seed used and we can then say that most of the seeds will be good for all these matrices. For random perturbations we increase the space usage at each level by . But the final space usage still is . That's it, we can now state Saks and Zhou's result that
Some follow-up work has been done after Saks and Zhou's result. For example, Jin-Yi Cai, Venkatesan Chakaravarthy and Dieter Van Melkebeek give a time-space tradeoff version of Saks and Zhou which says that, for any rational number , BPL DTISP(,). Since then, Reingold's log-space algorithm for USTCONN has been a major result.
There is a general feeling in the community that we might be close to proving RLL. In fact, Omar Reingold, Luca Trevisan and Salil Vadhan, in a paper in 2005, gave some strong evidence in favor of this.
Aside:
If you have read so far then a natural question to ask is: What about derandomization of time complexity classes. Can similar ideas be applied. Conceptually, Yes. Nisan's idea is pretty neat and generic. I have not done much survey but some initial reading seems to suggest that it is difficult to give unconditional pseudorandom generators which can fool time bounded complexity classes. May be the instructors can shed more light on this issue.
for a language L is a randomized machine running in log-space and satisfying the following conditions:
L Pr[M accepts ]
L Pr[M accepts ]
Also we know that, the computation of such a machine on input can be seen as a random walk on a graph , called the ``configuration graph''. Each node in this graph represents a configuration of the machine and each edge represents a valid transition. Let us denote by , the total number of nodes in this graph and let us assume that that the graph has a unique start state and unique accept and reject states and . Since, the machine runs in log-space we know that is polynomial in the length of the input. We can also associate, with this graph, a matrix Q called the ``transition matrix''. The entry Q[i,j] in this matrix contains the probability that the machine M goes from configuration to configuration in one step. This way of looking at a BPL machine is very convenient since now simulating the machine is equivalent to computing the step transition probability [ , ], for any . If this probability is more than we accept. If this probability is less than , we reject. Savitch's idea of repeated matrix squaring computes this probability exactly giving us the following derandomization result . We would also be interested in the running time used by these simulations, hence we will rewrite the above result as BPL DTISP(,).
Further improvements to this result are all based on a very nice ``duality'' between hardness and randomness. This relationship was made concrete in a paper by Noam Nisan and Avi Wigderson. The high level idea is as follows: Suppose we have pseudo-random generator which takes as input a
short truly random seed and can produce a very long sequence of bits which look random to any probabilistic machine. We can then use these pseudo-random bits instead of truly random bits to simulate the machine thereby reducing the amount of randomness used. This in turn has the potential to lead to better deterministic simulations. Unfortunately, pseudo-random generators are known to exist only under the assumption that certain cryptographic objects called one way functions exist. However, it is possible to construct specialized unconditional pseudo-random
generators which can fool a particular class of machines C. Here is one way to do it, choose a class of functions, one for each input length, which is hard for the class C. In other words for every , for every machine M C, we must have |[M() = f()] | , where is uniform distribution over and is very small(lets say ). What this means is that any machine in C can not do much better then random guessing when asked to predict the values of the hard functions. If we have such a class of functions, then here is a very simple pseudo-random generator which stretches its seed by 1 bit: . Notice, that although we are only appending at the end, no machine in C can distinguish that from a truly random bit with significant probability. This was roughly the idea behind the pseudo-random generator proposed by Nisan and Wigderson.
Later, in a breakthrough result, Noam Nisan gave a pseudo-random generator which can fool all probabilistic log-space machines and uses only bits of randomness. Notice, that this is an exponential improvement since a BPL machine can potentially use poly() random bits, where is the length of the input. Virtually, every result in derandomizing space complexity classes has used some ideas from Nisan's paper. Before stating Nisan's result some definitions:
Definition 1:
For a vector , we define the norm of to be |||| = . For a matrix over , we define the norm of M, ||M||, to be the maximum over all the rows M[,.] of ||M[,.]||.
Definition 2:
If M and N are square matrices of the same dimension and is a positive real number, we say that M approximates N with accuracy a if ||M N|| .
Recall that simulating a BPL machine is equivalent to calculating the matrix , for some . Hence, we now state and view Nisan's result as as randomized algorithm for approximating this matrix. We will call Nisan's algorithm PRS which stands for pseudorandom repeated squaring.
Theorem:
Let be an integer. Given as input a transition matrix Q and an integer , the PRS algorithm takes a random string of size as input, runs in space and computes another matrix , such that approximates with accuracy and error probability .
Notice, that it is enough to approximate the entries in the matrix , since we only want to detect a significant gap in the acceptance probability of when L and the probability of , when L. For simulating BPL, we need to choose such that . Choosing , gives us the desired log-space simulation which uses bits of randomness. A straightforward derandomization of this algorithm, i.e. enumerating through all the possible choices of the random seed, will not help us improve on the bound given by Savitch's theorem(WHY?). But, in a later paper Nisan observed that most of the bits in the random seed can actually be calculated deterministically, reducing the randomness used by the algorithm to . Derandomizing this new algorithm gives a deterministic simulation of BPL which uses space and time! Hence, Nisan was able to show that BPL DTISP(,). In other words, Nisan showed that BPL (``Steve's'' class). In a later work Nisan, Endre Szemerdi, and Wigderson, in 1992, showed that undirected s-t connectivity(USTCONN) is in . Of course, from Omar Reingold's result in 2004, we now know that USTCONN L. But at that point the USTCONN was a major indication that may be deterministic simulation of BPL could be done in . This was achieved by Michael Saks and Shiyu Zhou and will be the last result covered in this post.
The main idea behind Saks and Zhou's result is to somehow balance the space usage and the randomness usage of the PRS algorithm. One way to do this is to try and combine Savitch's theorem and Nisan's PRS algorithm, i.e. try to do repeated matrix squaring but at each level of recursion, instead of calculating the matrix exactly, approximate it using the PRS algorithm. Lets see how it works in detail.
Consider the PRS algorithm again. Lets divide the input into two parts and such that . We can calculate as a sequence matrices ,,,, such that = , = and = . At each of the levels of the recursion we can use the PRS algorithm
to approximately calculate the matrices. We are going to use space at each level for a total of space. Also we are going to use bits of randomness at each level for a total of bits of randomness. At present this does not seem very impressive, but what Saks and Zhou show is that we can actually use the same seed for the PRS algorithm at each level of recursion. This reduces the randomness complexity to . The reason why we can do that is the following: The analysis of the PRS algorithm shows that most of the seeds are good for a given fixed matrix. Hence, by union bound we must be able to show that most of the seeds are also good for a fixed set of matrices. If we can do that then we can set , and , to get an algorithm with space complexity and randomness complexity . Then, a straightforward derandomization will give us the desired result that BPL . But there is a catch.
Notice that we cannot really apply the union bound argument to the matrices produced by the PRS algorithm since each of them depends on the seed used in the previous level of recursion. Saks and Zhou get around this problem by introducing the idea of random perturbations. The main observation is that we can apply the union bound argument to the sequence of matrices if they are not approximately calculated but exactly calculated as in Savitch's theorem. Now, the PRS algorithm guarantees that the matrices will be very close to the exact ones. Hence, if we just truncate the values in the matrices ignoring the lower order bits, we should be done. Except that we might have some bad roundings at boundary cases. Hence, what Saks and Zhou do is that they reduce the value of each entry in the matrix at each level by some random quantity and then truncate the values. In this way, the new matrices are independent of the seed used and we can then say that most of the seeds will be good for all these matrices. For random perturbations we increase the space usage at each level by . But the final space usage still is . That's it, we can now state Saks and Zhou's result that
Some follow-up work has been done after Saks and Zhou's result. For example, Jin-Yi Cai, Venkatesan Chakaravarthy and Dieter Van Melkebeek give a time-space tradeoff version of Saks and Zhou which says that, for any rational number , BPL DTISP(,). Since then, Reingold's log-space algorithm for USTCONN has been a major result.
There is a general feeling in the community that we might be close to proving RLL. In fact, Omar Reingold, Luca Trevisan and Salil Vadhan, in a paper in 2005, gave some strong evidence in favor of this.
Aside:
If you have read so far then a natural question to ask is: What about derandomization of time complexity classes. Can similar ideas be applied. Conceptually, Yes. Nisan's idea is pretty neat and generic. I have not done much survey but some initial reading seems to suggest that it is difficult to give unconditional pseudorandom generators which can fool time bounded complexity classes. May be the instructors can shed more light on this issue.
Thursday, February 19, 2009
Lecture 11
In Lecture 11, we covered:
Telling coke and pepsi apart: Interactive protocol for Graph nonisomorphism;
Interactive proof for Counting-SAT (and so PH is a subset of IP);
The precise power of interaction: IP = PSPACE ;
Perfect completeness and public coins don't change power of IP.
Graded problem set 1 was handed back.
Telling coke and pepsi apart: Interactive protocol for Graph nonisomorphism;
Interactive proof for Counting-SAT (and so PH is a subset of IP);
The precise power of interaction: IP = PSPACE ;
Perfect completeness and public coins don't change power of IP.
Graded problem set 1 was handed back.
Tuesday, February 17, 2009
Lecture 10
Topics covered in Lecture 10:
Barrington's theorem: NC^1 has constant width branching programs (Ben-Or & Cleve's proof which showed width 7)
Lipton-Viglas theorem: SAT can't be solved in simultaneous n^{o(1)} space and n^{1.41} time.
Comments on relativization and limits of diagonalization
Introduction to Interactive proofs
Barrington's theorem: NC^1 has constant width branching programs (Ben-Or & Cleve's proof which showed width 7)
Lipton-Viglas theorem: SAT can't be solved in simultaneous n^{o(1)} space and n^{1.41} time.
Comments on relativization and limits of diagonalization
Introduction to Interactive proofs
Monday, February 16, 2009
Homework 2 -- CORRECTION
A correction has been made to Problem (6c) on the homework. Thanks to Or for pointing out the mistake and giving the correction. The new statement (which is also in the file on the course home page) is:
c) Suppose we form by blowing up each node in to a clique of size ; we put either all possible edges or no possible edges between the "-clique" and the "-clique", depending on whether or not was an edge in . Let , where is chosen randomly. Show that if 's maximum clique has size then with probability , has no -clique; and, if 's maximum clique has size then has a unique -clique with probability at least .
c) Suppose we form by blowing up each node in to a clique of size ; we put either all possible edges or no possible edges between the "-clique" and the "-clique", depending on whether or not was an edge in . Let , where is chosen randomly. Show that if 's maximum clique has size then with probability , has no -clique; and, if 's maximum clique has size then has a unique -clique with probability at least .
Solutions to Homework 1
I have just added the solutions to Homework 1 to the course webpage. I am still marking the homeworks, but I hope to return them to you before the end of the week. Thanks for your patience!
UPDATE: I have removed the link to the solutions from the course webpage, but they are still available upon request. If you want a copy, please email me.
UPDATE: I have removed the link to the solutions from the course webpage, but they are still available upon request. If you want a copy, please email me.
Homework 2
Not that I imagine anyone worried about this, but for #4, it's okay to show completeness under poly-time many-one reductions. (I.e., don't bother about log-space reductions.)
Saturday, February 14, 2009
A Quadratic Lower Bound for the Determinant and Permanent Problem
Join me, fellow students, in a quest to discover Complexity-Theory's version of the 'Holy Grail': lower bounds on the size of circuits. This quest began at the 1970s, when the question of vs. first arose. At those early ages, when people were naive, optimistic, and still new to the question , they sought to separate from . One direction, that initially seemed promising, was though circuit size. Clearly, . Furthermore, Karp showed that many functions belong to , and we know many functions with super-polynomial circuit-complexity do exist. So the hope was one would be able to show that some function in must have large circuit-size, thus proving . In fact, one can show super-polynomial lower bound for some function in any level of the , and that will prove such a separation. And since Toda's theorem gives, , the hope was that the Permanent, a -complete function, will have super-polynomial large circuits.
Recall that the Permanent is a function of numbers: , whereas circuits were defined over boolean values. So we introduce arithmetics circuits. Arithmetic circuits have gates (with fan-in 2), and gates (with fan-in 1), representing the arithmetic operations over some field , and additionally uses constants in . Clearly, every polynomial (and in particular the Permanent and the Determinant) can be represented as an arithmetic circuit, and furthermore, one can reduce any boolean circuit to an arithmetic circuit (representing NOT as and AND as ). Using arithmetic circuits, Valiant defined in 1979 the arithmetics analogs of and :
Definition 1: A polynomial of degree is in if there exists a arithmetic circuit family of size that computes it.
Definition 2: A polynomial of degree is in if for some (polynomially bounded in ) there exists a family of polynomials s.t. . For such and , we say that is a projection of .
Valiant showed that any polynomial in is a projection of the Permanent. He also showed a nice characterization of the functions of , which will take us back to linear algebra. Recall that an affine transformation of vector spaces is a combination of a linear transformation and a translation. We look at affine functions from the vector space of polynomials in variables over , to the vector space of all matrices.
Definition 3: We say that a polynomial has determinant complexity if any affine mapping s.t. , must map to matrices of size (or larger).
Valiant showed that if has circuit complexity , then its determinant complexity is , so all functions in have polynomially bounded determinant complexity. Hence, in order to separate from , one aspires to show that the determinant complexity of the Permanent is super-polynomial. Alas, until 2004, the best determinant-complexity lower bounds were linear in . In 2004, Mignon and Ressayre showed a quadratic lower bound:
Theorem: If the characteristic of is , then the determinant-complexity of the Permanent is at least .
The proof of this theorem naturally involves some linear algebra, and requires the definition of a special linear operator. If is a differentiable function, then we can define the tangent map , by: . I.e., we map every to a linear functional , where is the scalar product of and . Using the tangent and the fact that the space of linear functionals is a vector space, we can now define a tangent to the tangent. Let be defined by , where . I.e., we map every to a bilinear function where is constructed by the 2nd order partial derivatives of at . Alternatively, one can think of as a map , where is the linear functional . Recall that any bilinear function (and in particular ) has a rank, which is the rank of the matrix.
Here is the proof of Mignon and Ressayre's theorem in a nutshell. Let be an affine function s.t. . Then the following 3 claims hold:
Claim 1. is isomorphic to restricting to the image of , i.e. where . Since is a restriction of , we get .
Claim 2. There exists some non-invertible , s.t. has rank . (This is the most non-trivial part.)
Claim 3. For any non-invertible matrix it must hold that .
Combining the 3 steps together, we deduce the inequalities
None of the proofs of 1,2 or 3 is hard, and they all boil down to looking at the matrix generated by the operator. If
is our variable matrix, then we denote . For the minor (removing the th row and the th column of ), we denote . For any , we denote to be the minor of we get by removing both and rows, and and columns, and denote . Then observe the following: is the matrix
where for every , we have 
Proving Claim 2 is the result of looking at
. By sheer computation, they show that is of the form
, for two particular, invertible, and , and show that this gives a -matrix of full rank. We comment that the elements of and are multiplications of large factorials, so and are invertible because of the zero characteristic of .
Proving Claim 3 uses a similar observation, based on the structure of . If is a non-invertible matrix, then by changing basis, can be turned into , which is a diagonal matrix with some 0 entries on the main diagonal. and will have the same rank. If we replace in , we get that has the exact same structure as it had for the Permanent (replacing with , the determinant of the same minor). However, since is diagonal and non-invertible, then the determinant is for almost everywhere.
Proving Claim 1 boils down to showing the is injective, thus isomorphic over its image. If isn't injective, then exists a non-zero . Hence, it holds that , so the gradient of at must be . Alternatively, every maps into the -functional. So for every it holds that is a bilinear mapping, that maps every into a functional which is over . Thus for every it holds that is not of full rank. In particular, has of rank strictly smaller than , which contradicts Claim 2. The proof of the theorem is done.
SPOILER ALERT: The quest for lower bounds on circuits is (thus far) unsuccessful. Obviously, we still don't know how to show super-polynomial lower bounds on the determinant complexity of the Permanent. To this day, lower bounds are known only to a very restricted class of circuits (such as or monotone circuits). In fact, Razborov and Rudich showed in 1995 that proving lower bounds using the currently known techniques is intrinsically difficult. However, there is some positive sign: in 2004, Kabanets and Impagliazzo showed that under the assumption , either the Permanent or some function in doesn't have polynomial-size circuits.
Recall that the Permanent is a function of numbers: , whereas circuits were defined over boolean values. So we introduce arithmetics circuits. Arithmetic circuits have gates (with fan-in 2), and gates (with fan-in 1), representing the arithmetic operations over some field , and additionally uses constants in . Clearly, every polynomial (and in particular the Permanent and the Determinant) can be represented as an arithmetic circuit, and furthermore, one can reduce any boolean circuit to an arithmetic circuit (representing NOT as and AND as ). Using arithmetic circuits, Valiant defined in 1979 the arithmetics analogs of and :
Definition 1: A polynomial of degree is in if there exists a arithmetic circuit family of size that computes it.
Definition 2: A polynomial of degree is in if for some (polynomially bounded in ) there exists a family of polynomials s.t. . For such and , we say that is a projection of .
Valiant showed that any polynomial in is a projection of the Permanent. He also showed a nice characterization of the functions of , which will take us back to linear algebra. Recall that an affine transformation of vector spaces is a combination of a linear transformation and a translation. We look at affine functions from the vector space of polynomials in variables over , to the vector space of all matrices.
Definition 3: We say that a polynomial has determinant complexity if any affine mapping s.t. , must map to matrices of size (or larger).
Valiant showed that if has circuit complexity , then its determinant complexity is , so all functions in have polynomially bounded determinant complexity. Hence, in order to separate from , one aspires to show that the determinant complexity of the Permanent is super-polynomial. Alas, until 2004, the best determinant-complexity lower bounds were linear in . In 2004, Mignon and Ressayre showed a quadratic lower bound:
Theorem: If the characteristic of is , then the determinant-complexity of the Permanent is at least .
The proof of this theorem naturally involves some linear algebra, and requires the definition of a special linear operator. If is a differentiable function, then we can define the tangent map , by: . I.e., we map every to a linear functional , where is the scalar product of and . Using the tangent and the fact that the space of linear functionals is a vector space, we can now define a tangent to the tangent. Let be defined by , where . I.e., we map every to a bilinear function where is constructed by the 2nd order partial derivatives of at . Alternatively, one can think of as a map , where is the linear functional . Recall that any bilinear function (and in particular ) has a rank, which is the rank of the matrix.
Here is the proof of Mignon and Ressayre's theorem in a nutshell. Let be an affine function s.t. . Then the following 3 claims hold:
Claim 1. is isomorphic to restricting to the image of , i.e. where . Since is a restriction of , we get .
Claim 2. There exists some non-invertible , s.t. has rank . (This is the most non-trivial part.)
Claim 3. For any non-invertible matrix it must hold that .
Combining the 3 steps together, we deduce the inequalities
None of the proofs of 1,2 or 3 is hard, and they all boil down to looking at the matrix generated by the operator. If
Proving Claim 2 is the result of looking at
Proving Claim 3 uses a similar observation, based on the structure of . If is a non-invertible matrix, then by changing basis, can be turned into , which is a diagonal matrix with some 0 entries on the main diagonal. and will have the same rank. If we replace in , we get that has the exact same structure as it had for the Permanent (replacing with , the determinant of the same minor). However, since is diagonal and non-invertible, then the determinant is for almost everywhere.
Proving Claim 1 boils down to showing the is injective, thus isomorphic over its image. If isn't injective, then exists a non-zero . Hence, it holds that , so the gradient of at must be . Alternatively, every maps into the -functional. So for every it holds that is a bilinear mapping, that maps every into a functional which is over . Thus for every it holds that is not of full rank. In particular, has of rank strictly smaller than , which contradicts Claim 2. The proof of the theorem is done.
SPOILER ALERT: The quest for lower bounds on circuits is (thus far) unsuccessful. Obviously, we still don't know how to show super-polynomial lower bounds on the determinant complexity of the Permanent. To this day, lower bounds are known only to a very restricted class of circuits (such as or monotone circuits). In fact, Razborov and Rudich showed in 1995 that proving lower bounds using the currently known techniques is intrinsically difficult. However, there is some positive sign: in 2004, Kabanets and Impagliazzo showed that under the assumption , either the Permanent or some function in doesn't have polynomial-size circuits.
Friday, February 13, 2009
Combinatorial Complexity
There is a certain broad area of complexity theory which does not have a proper name.
As far as it occurs to me at this instant, there are (at least) three big areas of complexity theory:
1. "Structural complexity theory". This refers to understanding the relationships between complexity classes, bounding and classifying time vs. space vs. randomness, understanding reductions, etc.
2. "Algorithmic complexity theory". I just made this name up, but I'm thinking here about the area of understanding the complexity of various specific problems; I think of proving NP-hardness results as falling into this area, so I would put inapproximability and PCPs into this category.
3. The mystery area I referred to at the beginning of this post. I guess my best attempt at naming this category would be either "Combinatorial complexity", or "Lower bounds". Here I mean the study of things like formula size/depth, constant-depth circuit size, branching program sizes, decision tree size, DNF size... roughly, the study of non-uniform classes of computation that are "small" enough that one can make reasonable progress on proving lower bounds.
The impetus for this post is to point you to a nice-looking graduate course on this 3rd area being taught by Rocco Servedio at Columbia this term. If the lecture notes continue, this should be a great resource for this huge area of complexity theory.
As far as it occurs to me at this instant, there are (at least) three big areas of complexity theory:
1. "Structural complexity theory". This refers to understanding the relationships between complexity classes, bounding and classifying time vs. space vs. randomness, understanding reductions, etc.
2. "Algorithmic complexity theory". I just made this name up, but I'm thinking here about the area of understanding the complexity of various specific problems; I think of proving NP-hardness results as falling into this area, so I would put inapproximability and PCPs into this category.
3. The mystery area I referred to at the beginning of this post. I guess my best attempt at naming this category would be either "Combinatorial complexity", or "Lower bounds". Here I mean the study of things like formula size/depth, constant-depth circuit size, branching program sizes, decision tree size, DNF size... roughly, the study of non-uniform classes of computation that are "small" enough that one can make reasonable progress on proving lower bounds.
The impetus for this post is to point you to a nice-looking graduate course on this 3rd area being taught by Rocco Servedio at Columbia this term. If the lecture notes continue, this should be a great resource for this huge area of complexity theory.
Thursday, February 12, 2009
Lecture 9
Topics covered in Lecture 9:
Randomized space: (Saks-Zhou Theorem), USTCON . Circuits (as parallel computation): vs. , , , matrix multiplication in , nonuniform poly-size formulas. Branching programs: non-uniform poly-size branching programs, small-width branching programs, statement of Barrington's Theorem (nonuniform constant-width poly-size branching programs).
Randomized space: (Saks-Zhou Theorem), USTCON . Circuits (as parallel computation): vs. , , , matrix multiplication in , nonuniform poly-size formulas. Branching programs: non-uniform poly-size branching programs, small-width branching programs, statement of Barrington's Theorem (nonuniform constant-width poly-size branching programs).
Tuesday, February 10, 2009
Office hours today
Just a reminder that while class is canceled today, I will still be holding my office hours between 12:30pm and 1:30pm today, in Wean 3709.
Monday, February 9, 2009
Friday, February 6, 2009
Average Case Hardness
In 1971, Stephen Cook showed that the boolean satisfiability problem is NP-Complete under polynomial time reductions. Soon afterward, Karp showed that many natural combinatorial problems (Vertex Cover, Clique, Hamiltonian Circuit etc..) were also NP-Complete. However, these results are often misinterpreted. Because most computer scientists believe it is easy to assume that it is always hard to solve . This is not true. In fact most randomly generated instances can be solved efficiently by algorithms like DPLL. Assuming that then all we may conclude is that there is no efficient algorithm to solve these problems in the worst case. It is conceivable that problems could still admit polynomial time algorithms for average instances.
Notice that problems which are not hard on average are poor candidates for cryptography and one-way functions. Protocols such as Diffie-Hellman would absolutely fail if it was easy to compute the discrete logarithm (given a multiplicative generator in a finite field mod p, and an integer compute such that mod p) for average instances. Russel Impagliazzo (A Personal View of Average-Case Complexity) describes several possible worlds and their applications to cryptography.
World 1: Algorithmica or . We all know that such a result would revolutionize computer science and machine learning. However, cryptography would be impossible in this world.
World 2: Heuristica , but, over any efficiently computable distribution, problems are tractable on average. Intuitively, there are hard instances of problems, but the problem of finding hard instances is itself intractable. Once again, crytography would most likely not be feasible in Heuristica because eavesdroppers could solve problems in a time comparable to the time it took to generate the problem.
World 3: Pessiland AND there are problems which are hard in the average-case for efficiently computable distribution. However, there are no one way functions. This means that given it is usually possible to find some such that in time comparable to the ammount of time needed to compute . In Pessiland, it would be easy to generate hard instances of problems, but there would be no way of generating hard (presolved) instances of problems. Public key crytography would still not be possible. Thus, Average Case Hardness is a necessary (but not sufficient) condition for cryptography.
Worlds 4 and 5: Minicrypt and Cryptomania. In Cryptomania, almost any cryptographic task is possible. This is the world we typically assume we live in, because we assume protocols like RSA and Diffie-Hellman are secure. However, the only evidence that the Discrete Logarithm problem is hard is that fact that we don't know of an efficient algorithm to compute it. In reality, there is no theoretical reason why the problem should be hard. In fact, Peter Shor recently showed that Discrete Logarithms can be solved efficiently on a Quantum Computer.
Fortunately for cryptographers, the Discrete Logarithm problem is a beautiful example of self reducibility.
Lemma: Suppose that we had an algorithm which computes the the discrete logarithm efficiently (in time , ) for at least values of , then there is an efficient ZPP algorithm to compute the discrete logarithm efficiently.
Proof (Sketch): We can pick such that solves at least instances in time , let .
Input: Generator g, Integer x, Prime p
Output: such that mod p or FAIL
For
---
--- mod p
---// Number Theory Fact: this is equivalent to picking
---If returns in steps then
------// Now we know the answer due to number theory:
------// mod p
------// mod p
------//
------return
Output Failed
It is not too hard to verify that the algorithm takes polynomial time and with high probability computes the correct answer. QED
I am unaware of any NP-Complete problems which are self reducible in a similar sense as the Discrete Logarithm problem. If such a self reduction was found I suspect that this would be a very interesting result in complexity theory.
A more common approach to the problem of defining average case hardness is that of Leonid Levin. Levin (Average Case Complete Problems) defines the notion of a random NP problem and a complete random NP problem. His definitions are tedious and hard to follow at times, but they are worth the time and energy to understand.
Definition: A random problem is a pair where is an instance witness and is a probability distribution function.
A problem instance is an integer is the problem instance. if and only if there is such that . The a probability distribution function gives the probability of all problem instances which do not exceed .
Notation: is the probability density function.
Also, by convention
Definition: A random problem is in NP if both and are both computable in polynomial time. We call such a problem a random NP Problem.
In other words, given a tuple (a problem instance and a verifier) we can decide whether or not in polynomial time in . We can also sample from the distribution in polynomial time.
Definition: A random problem is polynomial on average if there is a Turing Machine , which runs in time such that:
The first condition guarantees that actually decides the problem. The second condition guarantees that the Turing Machine must run quickly on average instances sampled from our distribution .
Definition: We say that the probability distribution function dominates if . In such a case we write .
Intuitively, this definition guarantees that all of the 'likely' instances of are also the 'likely' inputs of
We are finally ready to define the notion of reduction between random NP problems.
Definition:A polynomial time computable function reduces a random NP problem to another random NP problem if the following conditions hold:
Levin shows that these reductions are closed under composition. If is an algorithm that is fast on average for then runs at most polynomially slower for .
Definition: A random NP problem is complete if every random NP problem is reducible to it.
Levin proved that Tiling is an NP-complete random problem. Other natural combinatorial problems such as Graph Coloring and Matrix Decomposition have also been shown to be NP-complete random problems.
Notice that problems which are not hard on average are poor candidates for cryptography and one-way functions. Protocols such as Diffie-Hellman would absolutely fail if it was easy to compute the discrete logarithm (given a multiplicative generator in a finite field mod p, and an integer compute such that mod p) for average instances. Russel Impagliazzo (A Personal View of Average-Case Complexity) describes several possible worlds and their applications to cryptography.
World 1: Algorithmica or . We all know that such a result would revolutionize computer science and machine learning. However, cryptography would be impossible in this world.
World 2: Heuristica , but, over any efficiently computable distribution, problems are tractable on average. Intuitively, there are hard instances of problems, but the problem of finding hard instances is itself intractable. Once again, crytography would most likely not be feasible in Heuristica because eavesdroppers could solve problems in a time comparable to the time it took to generate the problem.
World 3: Pessiland AND there are problems which are hard in the average-case for efficiently computable distribution. However, there are no one way functions. This means that given it is usually possible to find some such that in time comparable to the ammount of time needed to compute . In Pessiland, it would be easy to generate hard instances of problems, but there would be no way of generating hard (presolved) instances of problems. Public key crytography would still not be possible. Thus, Average Case Hardness is a necessary (but not sufficient) condition for cryptography.
Worlds 4 and 5: Minicrypt and Cryptomania. In Cryptomania, almost any cryptographic task is possible. This is the world we typically assume we live in, because we assume protocols like RSA and Diffie-Hellman are secure. However, the only evidence that the Discrete Logarithm problem is hard is that fact that we don't know of an efficient algorithm to compute it. In reality, there is no theoretical reason why the problem should be hard. In fact, Peter Shor recently showed that Discrete Logarithms can be solved efficiently on a Quantum Computer.
Fortunately for cryptographers, the Discrete Logarithm problem is a beautiful example of self reducibility.
Lemma: Suppose that we had an algorithm which computes the the discrete logarithm efficiently (in time , ) for at least values of , then there is an efficient ZPP algorithm to compute the discrete logarithm efficiently.
Proof (Sketch): We can pick such that solves at least instances in time , let .
Input: Generator g, Integer x, Prime p
Output: such that mod p or FAIL
For
---
--- mod p
---// Number Theory Fact: this is equivalent to picking
---If returns in steps then
------// Now we know the answer due to number theory:
------// mod p
------// mod p
------//
------return
Output Failed
It is not too hard to verify that the algorithm takes polynomial time and with high probability computes the correct answer. QED
I am unaware of any NP-Complete problems which are self reducible in a similar sense as the Discrete Logarithm problem. If such a self reduction was found I suspect that this would be a very interesting result in complexity theory.
A more common approach to the problem of defining average case hardness is that of Leonid Levin. Levin (Average Case Complete Problems) defines the notion of a random NP problem and a complete random NP problem. His definitions are tedious and hard to follow at times, but they are worth the time and energy to understand.
Definition: A random problem is a pair where is an instance witness and is a probability distribution function.
A problem instance is an integer is the problem instance. if and only if there is such that . The a probability distribution function gives the probability of all problem instances which do not exceed .
Notation: is the probability density function.
Also, by convention
- is true if and only if .
Definition: A random problem is in NP if both and are both computable in polynomial time. We call such a problem a random NP Problem.
In other words, given a tuple (a problem instance and a verifier) we can decide whether or not in polynomial time in . We can also sample from the distribution in polynomial time.
Definition: A random problem is polynomial on average if there is a Turing Machine , which runs in time such that:
- converges.
The first condition guarantees that actually decides the problem. The second condition guarantees that the Turing Machine must run quickly on average instances sampled from our distribution .
Definition: We say that the probability distribution function dominates if . In such a case we write .
Intuitively, this definition guarantees that all of the 'likely' instances of are also the 'likely' inputs of
We are finally ready to define the notion of reduction between random NP problems.
Definition:A polynomial time computable function reduces a random NP problem to another random NP problem if the following conditions hold:
Levin shows that these reductions are closed under composition. If is an algorithm that is fast on average for then runs at most polynomially slower for .
Definition: A random NP problem is complete if every random NP problem is reducible to it.
Levin proved that Tiling is an NP-complete random problem. Other natural combinatorial problems such as Graph Coloring and Matrix Decomposition have also been shown to be NP-complete random problems.
Thursday, February 5, 2009
Lecture 8
Today we proved the following five theorems:
1.
2. ST-Connectivity is -complete.
3. Savitch's Theorem: .
4. TQBF is -complete.
5. Immerman-Szelepcsényi Theorem: .
Here is a space-constructible bound. (I believe that for most of these, if you want to get really really technical, you can even drop space-constructibility :)
1.
2. ST-Connectivity is -complete.
3. Savitch's Theorem: .
4. TQBF is -complete.
5. Immerman-Szelepcsényi Theorem: .
Here is a space-constructible bound. (I believe that for most of these, if you want to get really really technical, you can even drop space-constructibility :)
Tuesday, February 3, 2009
Lecture 7
Today we covered: Permanent is #P-complete (statement); Toda's Theorem; statements of basic space theorems (non-deterministic space in deterministic time, Savitch's Theorem, Immerman-Szelepcsenyi Theorem).
Monday, February 2, 2009
Homework 2
Homework 2 has been posted slightly early for your enjoyment! It's linked to on the course home page. Due Feb. 17.
I assume you all saw the hints for Homework 1 in the comments to the previous post.
I assume you all saw the hints for Homework 1 in the comments to the previous post.
Subscribe to:
Posts (Atom)