## Saturday, February 21, 2009

### Derandomization of space classes

Understanding the exact computational power of randomness is an intriguing and very challenging problem in computer science. Although we are far away from an answer, a lot of nice results have emerged in the last two decades, particularly in the area of space bounded computations. In this post, we will look at a couple of such results. We will restrict our attention to randomized log-space machines, or the complexity class BPL, but the results we mentioned generalize to BSPACE($s(n)$) for any space constructible function $s(n) \ge \log n$. Recall that a BPL machine M
for a language L is a randomized machine running in log-space and satisfying the following conditions:
$x \in$ L $\Rightarrow$ Pr[M accepts $x$] $\ge \frac{2}{3}$
$x \notin$ L $\Rightarrow$ Pr[M accepts $x$] $\le \frac{1}{3}$
Also we know that, the computation of such a machine on input $x$ can be seen as a random walk on a graph $G$, called the configuration graph''. Each node in this graph represents a configuration of the machine and each edge represents a valid transition. Let us denote by $d$, the total number of nodes in this graph and let us assume that that the graph has a unique start state $c_{start}$ and unique accept and reject states $c_{accept}$ and $c_{reject}$. Since, the machine runs in log-space we know that $d$ is polynomial in the length of the input. We can also associate, with this graph, a matrix Q called the transition matrix''. The entry Q[i,j] in this matrix contains the probability that the machine M goes from configuration $i$ to configuration $j$ in one step. This way of looking at a BPL machine is very convenient since now simulating the machine is equivalent to computing the $p$ step transition probability $Q^p$[$c_{start}$ , $c_{accept}$], for any $p \ge d$. If this probability is more than $\frac{2}{3}$ we accept. If this probability is less than $\frac{1}{3}$, we reject. Savitch's idea of repeated matrix squaring computes this probability exactly giving us the following derandomization result $BPL \subseteq L^2$. We would also be interested in the running time used by these simulations, hence we will rewrite the above result as BPL $\subseteq$ DTISP($n^{O(\log n)}$,$(\log n)^2$).

Further improvements to this result are all based on a very nice duality'' between hardness and randomness. This relationship was made concrete in a paper by Noam Nisan and Avi Wigderson. The high level idea is as follows: Suppose we have pseudo-random generator which takes as input a
short truly random seed and can produce a very long sequence of bits which look random to any probabilistic machine. We can then use these pseudo-random bits instead of truly random bits to simulate the machine thereby reducing the amount of randomness used. This in turn has the potential to lead to better deterministic simulations. Unfortunately, pseudo-random generators are known to exist only under the assumption that certain cryptographic objects called one way functions exist. However, it is possible to construct specialized unconditional pseudo-random
generators which can fool a particular class of machines C. Here is one way to do it, choose a class of functions, one for each input length, which is hard for the class C. In other words for every $n \ge 1$, for every machine M $\in$ C, we must have |$Pr_{x \in U_n}$[M($x$) = f($x$)] $-$ $\frac{1}{2}$| $\le \epsilon$, where $U_n$ is uniform distribution over $\{0,1\}^n$ and $\epsilon$ is very small(lets say $\frac{1}{n^2}$). What this means is that any machine in C can not do much better then random guessing when asked to predict the values of the hard functions. If we have such a class of functions, then here is a very simple pseudo-random generator which stretches its seed by 1 bit: $G_f : s \mapsto (s,f(s))$. Notice, that although we are only appending $f(s)$ at the end, no machine in C can distinguish that from a truly random bit with significant probability. This was roughly the idea behind the pseudo-random generator proposed by Nisan and Wigderson.

Later, in a breakthrough result, Noam Nisan gave a pseudo-random generator which can fool all probabilistic log-space machines and uses only $(\log n)^2$ bits of randomness. Notice, that this is an exponential improvement since a BPL machine can potentially use poly($n$) random bits, where $n$ is the length of the input. Virtually, every result in derandomizing space complexity classes has used some ideas from Nisan's paper. Before stating Nisan's result some definitions:

Definition 1:
For a vector $x \in {\Re}^d$, we define the $L_1$ norm of $x$ to be ||$x$|| = $\sum_i x_i$. For a $d \times d$ matrix over $\Re$, we define the norm of M, ||M||, to be the maximum over all the rows M[$i$,.] of ||M[$i$,.]||.

Definition 2:
If M and N are square matrices of the same dimension and $a$ is a positive real number, we say that M approximates N with accuracy a if ||M $-$ N|| $\le 2^{-a}$.

Recall that simulating a BPL machine is equivalent to calculating the matrix $Q^p$, for some $p \ge d$. Hence, we now state and view Nisan's result as as randomized algorithm for approximating this matrix. We will call Nisan's algorithm PRS which stands for pseudorandom repeated squaring.

Theorem:
Let $a$ be an integer. Given as input a $d \times d$ transition matrix Q and an integer $r$, the PRS algorithm takes a random string $h$ of size $O(r(\log n))$ as input, runs in space $O(r + \log n)$ and computes another matrix $Q^\prime$, such that $Q^\prime$ approximates $Q^{2^{r}}$ with accuracy $a$ and error probability $2^{-2a}$.

Notice, that it is enough to approximate the entries in the matrix $Q^{2^r}$, since we only want to detect a significant gap in the acceptance probability of $\frac{2}{3}$ when $x \in$ L and the probability of $\frac{1}{3}$, when $x \notin$ L. For simulating BPL, we need to choose $r$ such that $2^r \ge d$. Choosing $r = O(\log n)$, gives us the desired log-space simulation which uses $O((\log n)^2)$ bits of randomness. A straightforward derandomization of this algorithm, i.e. enumerating through all the possible choices of the random seed, will not help us improve on the bound given by Savitch's theorem(WHY?). But, in a later paper Nisan observed that most of the bits in the random seed can actually be calculated deterministically, reducing the randomness used by the algorithm to $O(\log n)$. Derandomizing this new algorithm gives a deterministic simulation of BPL which uses $O((\log n)^2)$ space and $n^{O(1)}$ time! Hence, Nisan was able to show that BPL $\subseteq$ DTISP($n^{O(1)}$,$(\log n)^2$). In other words, Nisan showed that BPL $\subseteq SC$(Steve's'' class). In a later work Nisan, Endre Szemer$\grave{\textrm{e}}$di, and Wigderson, in 1992, showed that undirected s-t connectivity(USTCONN) is in $L^{\frac{3}{2}}$. Of course, from Omar Reingold's result in 2004, we now know that USTCONN $\in$ L. But at that point the USTCONN $\in$ $L^{\frac{3}{2}}$ was a major indication that may be deterministic simulation of BPL could be done in $L^{\frac{3}{2}}$. This was achieved by Michael Saks and Shiyu Zhou and will be the last result covered in this post.

The main idea behind Saks and Zhou's result is to somehow balance the space usage and the randomness usage of the PRS algorithm. One way to do this is to try and combine Savitch's theorem and Nisan's PRS algorithm, i.e. try to do repeated matrix squaring but at each level of recursion, instead of calculating the matrix exactly, approximate it using the PRS algorithm. Lets see how it works in detail.
Consider the PRS algorithm again. Lets divide the input $r$ into two parts $r_1$ and $r_2$ such that $r = r_1r_2$. We can calculate $Q^{2^r}$ as a sequence matrices $Q_0$,$Q_1$,$\ldots$,$Q_{r_2}$, such that $Q_0$ = $Q^{2^{r_1}}$, $Q_{i+1}$ = ${Q_i}^{2^{r_1}}$ and $Q_{r_2}$ = $Q^{2^r}$. At each of the $r_2$ levels of the recursion we can use the PRS algorithm
to approximately calculate the matrices. We are going to use $O(\log n)$ space at each level for a total of $O(r_2 \log n)$ space. Also we are going to use $O(r_1 \log n)$ bits of randomness at each level for a total of $O(r \log n)$ bits of randomness. At present this does not seem very impressive, but what Saks and Zhou show is that we can actually use the same seed for the PRS algorithm at each level of recursion. This reduces the randomness complexity to $O(r_1 \log n)$. The reason why we can do that is the following: The analysis of the PRS algorithm shows that most of the seeds are good for a given fixed matrix. Hence, by union bound we must be able to show that most of the seeds are also good for a fixed set of matrices. If we can do that then we can set $r = O(\log n)$, and $r_1=r_2$, to get an algorithm with space complexity $O({(\log n }^{\frac{3}{2}})$ and randomness complexity $O({(\log n)}^{\frac{3}{2}})$. Then, a straightforward derandomization will give us the desired result that BPL $\subseteq L^{\frac{3}{2}}$. But there is a catch.

Notice that we cannot really apply the union bound argument to the matrices produced by the PRS algorithm since each of them depends on the seed used in the previous level of recursion. Saks and Zhou get around this problem by introducing the idea of random perturbations. The main observation is that we can apply the union bound argument to the sequence of matrices if they are not approximately calculated but exactly calculated as in Savitch's theorem. Now, the PRS algorithm guarantees that the matrices will be very close to the exact ones. Hence, if we just truncate the values in the matrices ignoring the lower order bits, we should be done. Except that we might have some bad roundings at boundary cases. Hence, what Saks and Zhou do is that they reduce the value of each entry in the matrix at each level by some random quantity and then truncate the values. In this way, the new matrices are independent of the seed used and we can then say that most of the seeds will be good for all these matrices. For random perturbations we increase the space usage at each level by $O(\log n)$. But the final space usage still is $O((\log n)^{\frac{3}{2}})$. That's it, we can now state Saks and Zhou's result that
$BPL \subseteq DTISP(n^{O({(\log n)}^{.5})},{(\log n)}^{1.5})$

Some follow-up work has been done after Saks and Zhou's result. For example, Jin-Yi Cai, Venkatesan Chakaravarthy and Dieter Van Melkebeek give a time-space tradeoff version of Saks and Zhou which says that, for any rational number $0 \le \alpha \le .5$, BPL $\subseteq$ DTISP($n^{O({(\log n)}^{.5-\alpha})}$,${(\log n)}^{1.5 + \alpha}$). Since then, Reingold's log-space algorithm for USTCONN has been a major result.

There is a general feeling in the community that we might be close to proving RL$=$L. In fact, Omar Reingold, Luca Trevisan and Salil Vadhan, in a paper in 2005, gave some strong evidence in favor of this.

Aside:
If you have read so far then a natural question to ask is: What about derandomization of time complexity classes. Can similar ideas be applied. Conceptually, Yes. Nisan's idea is pretty neat and generic. I have not done much survey but some initial reading seems to suggest that it is difficult to give unconditional pseudorandom generators which can fool time bounded complexity classes. May be the instructors can shed more light on this issue.

1. We will most likely cover Nisan's PRG for small space machines and the Nisan-Wigderson generator for BPP machines in lectures. The post describes the high level connection between hardness and randomness used in the NW generator. To turn this idea into the construction of a useful generator, one needs to output more bits. This is done in the Nisan-Wigderson generator by applying the hard function to several (carefully chosen) subsets of bits of the seed s (so now the seed has to be somewhat longer than the input length of f). The subsets are picked to have limited intersection (i.e., from a "combinatorial design") and this is crucial to the very elegant analysis.

Unconditional PRGs for BPP machines are still unknown and constructing them is believed to be very hard. There is also formal evidence to this belief, as good PRGs for BPP would imply non-trivial circuit lower bounds. (We might cover this connection towards the end of the course.)

Cryptographic PRGs, where the output has to look random to machines that run in time in much greater than the time complexity of the PRG itself, imply the existence of one-way functions (and hence P not equal to NP). In the context of derandomization, the PRG can run in
time greater than the machine it is fooling, and so one might be able to get by with proving weaker circuit lower bounds. But even the weakest such lower bound which is necessary for derandomization seems very difficult to prove.

2. The connection between PRGs and circuit lower bounds mentioned above should point to the following:

Derandomizing Polynomial Identity Tests Means Proving Circuit Lower Bounds

Note: Only a member of this blog may post a comment.