Thanks again to all the hardy souls who hung in there for the semester!
Please remember to fill in the course evaluation at my.cmu.edu, under Academics.
Final grades are more or less tabulated; please email us if you want to learn them early.
best,
Ryan
Sunday, May 3, 2009
Friday, May 1, 2009
Parameterized Complexity and ETH
We look into Parameterized Complexity theory (introduced by Downey and Fellows via many papers) and its connections to classical complexity theory specifically via the Exponential Time Hypothesis (of Impaggliazzo and Paturi 2001) in this post. Parameterized complexity provides a framework for a more refined analysis of hard problems. Heuristics, parallel algorithms, approximation schemes, randomized algorithms are some of the approaches used to counter such problems. But these approaches suffer from many defects ranging from no hard bounds on the quality of the solution (heuristics), to not being applicable to really large instances (parallel algorithms) to impractical solutions (like some PTAS's etc.).
Introduction:
Classical complexity classifies problems using the concept of some resource (time or space). This leads to a good theory but it also ignores any structural information in the input which makes problems appear harder then they really are. There is a wide variation in the worst-case complexities of known exact algorithms for the NP-complete problems. For e.g., there are pretty good 3SAT solvers present right now which scale to a large number of variables.
Parameterized complexity tries to explain these differences by assuming that there is some part of the problem (the `parameter') will be small and allow us to develop efficient poly-time algorithms. The classic example for this is to consider a database query - it has two parts the database and the query of which query is usually much smaller than the database. The query size $k$ would be the natural paramter for a parameterized complexity analysis by admitting algorithms whose non-polynomial behavior is restricted by the parameter: if $k$ is small and $n$ is large, $O(2^k . n)$ is better than $O(n^k)$.
The main contribution of the theory is establishing the intractability of certain problems by classifying problems into complexity classes by reductions which are inherently 2-dimensional depending on the problem size as well the parameter. A problem can have different parameterizations also, each leading to different results.
Complexity Classes - FPT and the W[t] hierarchy:
A parameterized language $L$ is a subset of $\Sigma^* x \Sigma^*$ where $\Sigma$ is a finite alphabet. Let $(x, k) \in L$, then we call $x$ the main part and $k$ the parameter.
a) Tractability - Fixed-parameter tractable (FPT): $L \in FPT$ if it can be decided in time at most $f(k) . |x|^{O(1)}$ for an arbitary function $f$. For a fixed $k$ it is in P, moreover for every $k$ it is in the same polynomial class via the same machine. As an example, consider the $p-SAT$ problem where given the formula $\phi$ and the parameter $k = $ number of variables in $\phi$ decide whether $\phi$ is satisfiable. This is clearly in FPT as the obvious brute force approach for formula of size $n$ with $k$ variables will take time $O(2^k . n)$. There are many real world problems which are in FPT like parameterized Vertex Cover which uses the kernelization technique to get a $O(k . n + 1.286^k)$ algorithm (due to Chen, Kanj and Jia, 2001). Note that the classical Vertex Cover is NP complete - yet this result shows that it
is not as `hard' as say Independent Set.
b) Parametric Intractability: Analogously to classical complexity theory, Downey and Fellows developed a completeness program for intractable parameterized problems. In order to compare the hardness of parameterized problems, we need a 2-dimensional reduction (called `parameterized-reduction'). Roughly, a language $L$ is parameterically reducible to $L'$ if there is an FPT algorithm that transforms $(x, k)$ to $(x', k')$ so that $(x, k) \in L$ iff $(x', k') \in L'$ and $k' = g(k)$ where $g$ is an unrestricted function. Note that Karp reductions are rarely parameterized reductions (e.g. Clique to Independent Set is one of the exceptions).
1) W[1]: The lowest class of parameterized intractability can be defined as the set of languages that are reducible to the Short-Turing Machine Acceptance problem (also called the $k$-step halting problem) where we want to determine for an input consisting of a nondeterministic Turing machine M and a string x, whether M has a computation path accepting x in at most k steps. In some sense, this is the parameterized analogue to the Turing machine acceptance problem: the basic generic NP-complete problem in classical complexity theory. Canonical problems include Independent Set (does $G$ have an independent of size $k$), Weighted 3SAT (does $\phi$ have a satisfying assignment of weight $k$) etc. We will also give an alternative definition of W[1] afterwards which is actually used for the basic results.
2) The W[t] hierarchy: Interestingly, while general $k-$SAT and 3SAT are equivalent for NP-hardness in classical complexity, there is no known parameterized reduction computing general satisfiability from 3-CNF formulae. This leads to a realization that the logical depth of a formula affects its parameterized complexity. Intuitively it is related to the number of alternations between unbounded fan-in AND and OR gates. Hence, we can base the hierarchy on the complexity of circuits required to check a solution.
The W[t] is based on Circuit-SAT problem (parameterized by the weight of input like for Weighted 3SAT) for family of circuits $F_{h, t}$ having:
1. Small gates: with constant fan-in
2. Large gates: with unbounded fan-in
and of depth $h$ (max. number of gates in the path) and weft (max. number of large gates in the path) at most $t$. Clearly, we have the containments:
$FPT \subseteq W[1] \subseteq W[2] \ldots $
It is not known whether the containments are strict or not. Note that W[1] can now be defined as the class that can be reduced to an parameterized Circuit SAT on the family of constant depth weft 1 circuits. Downey and Fellows (1995) proved that parameterized Independent Set is W[1]-complete. The hardness proof is very intricate and we omit it here. Note that using this definition it is easy to see why Independent Set is in W[1] (sketch):
1. Each vertex in $G$ corresponds to one input gate in the circuit
2. For every edge $(v, u) \in G$ build a small OR gate: $(1-v) \wedge (1-u))$
3. Output from small gates are given as input to a single large AND gate.
Interestingly unlike NP-hardness results, W[1]-hardness for the $k-$Halting Problem uses Independent Set completeness as intermediate results. There have been many papers demonstrating naturally occuring W[t]-complete problems like Dominating Set for W[2] etc.)- for an extensive list of already classified problems see the Downey and Fellows monograph.
Exponential Time Hypothesis (ETH)
ETH was first studied by Impagliazzo, Paturi and Zane 2001 and it states that 3SAT $\notin$ DTIME$(2^{o(n)})$, where $n$ is the number of variables. They also proved that the hypothesis is robust to analogous hypotheses for other NP-complete problems like Independent Set etc. In fact, they also showed that the ETH is independent of size measure: 3SAT is solvable in time $2^{o(n)}$ iff it is solvable in time $2^{o(m)}$ for input size $m$. Note that this suggests that weighted 3SAT should also be intractable for `any parameter'.
Connections with Classical complexity:
The above paragraph suggests that ETH and parameterized complexity might be related. In fact they are and Chen and Grohe recently proved that ETH and paramterized complexity theory are isomorphic by an explicit reduction preserving isomorphism. We don't show that here, instead we prove a simpler result that yet provides a strong link to ETH (proof is a version from Downey, Castro et. al 2003 and uses parameterized miniaturization and is different from the original proof by Abhramson, Downey and Fellows):
Theorem: If FPT = W[1], then ETH fails i.e. 3SAT $\in$ DTIME$(2^{o(n)})$.
Proof: We use the equivalent definition of ETH based on simple Circuit SAT. The idea is to capture the ETH perfectly in a parameterized complexity class. The starting point is the Mini-Circuit SAT problem which is a parameterized miniaturization of simple Circuit SAT:
Input: Positive integers $k$ and $n$, and a Boolean circuit of total size at most $k\log n$.
Decision: Does there exist any $x$ for which $C(x) = 1$.
Note that the parameter here is $k$. Also, trying all possible inputs gives a brute force $O(n^k)$ algorithm. Next we give a cruicial lemma due to Cai and Juedes 2001. It essentially fully characterizes the ETH with a complexity class in parameterized theory.
Lemma 1: Mini-Circuit SAT is in FPT iff ETH fails.
Proof: One direction follows from the brute force algorithm and noting that $2^{o(k \log n)}$ is a FPT function.
Now suppose we are given a boolean circuit $C$ of size $N$ and that Mini-Cirsuit SAT is solvable in FPT time $f(k)n^c$. Set $k = f^{-1}(N)$ and $n = 2^{N/k}$. In general $k = f^{-1}(N)$ will be some slowly growing function of $N$; so $N/k = o(N)$ and also $cN/k = o(N)$. Hence using the FPT algorithm for Mini-Circuit SAT we have a running time for Circuit SAT as: $f(f^{-1}(N)) (2^{N/k})^c = N 2^{cN/k} = 2^{cN/k + \log N} = 2^{o(N)}$.
Thus ETH fails. Proved.
Now lets define the complexity class MINI[1] to be the set of languages that are FPT reducible to Mini-Circuit SAT. It turns out that many $k\log n$ miniatures of familiar NP-complete problems are MINI[1] complete (Downey, Fellows et al. 2002). It is easy to see this because essentially all the usual NP-complete reductions of Circuit SAT to these problems work as FPT reductions because they were also linear size reductions.
We concentrate on the Mini-Independent Set problem. Surprisingly, it can be reduced to the usual parameterized Independent Set problem.
Lemma 2: Independent Set parameterized by the size of independent set is MINI[1]-hard.
Proof: We give a Turing reduction. Let graph $G = (V, E)$ be the miniature, for which we wish to determine whether $G$ has an independent set of size $r$ with $|V| \leq k\log n$. We can think of verices of $G$ as organized in $k$ blocks $V_1, V_2, \ldots, V_k$ each of size $\log n$. So for each possible way of writing $r$ as a sum of $k$ terms $r = r_1 + r_2 + r_3 + \ldots + r_k$ with each $r_i \leq \log n$, we have a turing reduction branch which represnts a commitment to choose $r_i$ vertices from the corresponding block $V_i$ to be in the independent set. The total number of branches are $(\log n)^k$ and again it is a FPT-function.
For each branch, we produce a graph $G'$ that has an independent set of size $k$ iff the miniature $G$ has an independent set of size $r$ distributed as indicated by the commitment made on that branch. The graph $G'$ consists of $k$ cliques with some cross edges. The ith clique consists of vertices in correspondence with the subsets of $V_i$ of size $r_i$. An edge connects a vertex $x$ in the ith clique and a vertex $y$ in the jth clique iff there is a vertex $u$ in the subset $S_x \subseteq V_i$ represented by $x$ and a vertex $v$ in the subset $S_y \subseteq V_j$ represented by $y$, such that $(u, v) \in E$.
From Lemma 2 we can conclude that $FPT \subseteq MINI[1] \subseteq W[1]$. We just need to observe now that if FPT = W[1] it implies $M[1] \subseteq FPT$. By Lemma 1, ETH fails.
Proved.
Discussion:
The above result hints that FPT vs W[1] problem is like the P vs NP problem of classical complexity theory. In fact it was also proved by Downey and Fellows that FPT $\neq$ W[1] implies P $\neq$ NP. But it is not known that if FPT $\neq$ W[1] implies anything about the rest of W[t] hierarchy. Importantly, practical intracbility of problems in NP, which are unlikely to be complete for NP, can be shown using W[t] hardness results. Parameterized complexity has deep connections to other complexity results as well. Just recently (a week back) Galesi and Lauria showed a connection with Proof Complexity based on randomized algorithms for W[t]-hard problems. The field is very active and there are many papers and surveys being published in it.
Introduction:
Classical complexity classifies problems using the concept of some resource (time or space). This leads to a good theory but it also ignores any structural information in the input which makes problems appear harder then they really are. There is a wide variation in the worst-case complexities of known exact algorithms for the NP-complete problems. For e.g., there are pretty good 3SAT solvers present right now which scale to a large number of variables.
Parameterized complexity tries to explain these differences by assuming that there is some part of the problem (the `parameter') will be small and allow us to develop efficient poly-time algorithms. The classic example for this is to consider a database query - it has two parts the database and the query of which query is usually much smaller than the database. The query size $k$ would be the natural paramter for a parameterized complexity analysis by admitting algorithms whose non-polynomial behavior is restricted by the parameter: if $k$ is small and $n$ is large, $O(2^k . n)$ is better than $O(n^k)$.
The main contribution of the theory is establishing the intractability of certain problems by classifying problems into complexity classes by reductions which are inherently 2-dimensional depending on the problem size as well the parameter. A problem can have different parameterizations also, each leading to different results.
Complexity Classes - FPT and the W[t] hierarchy:
A parameterized language $L$ is a subset of $\Sigma^* x \Sigma^*$ where $\Sigma$ is a finite alphabet. Let $(x, k) \in L$, then we call $x$ the main part and $k$ the parameter.
a) Tractability - Fixed-parameter tractable (FPT): $L \in FPT$ if it can be decided in time at most $f(k) . |x|^{O(1)}$ for an arbitary function $f$. For a fixed $k$ it is in P, moreover for every $k$ it is in the same polynomial class via the same machine. As an example, consider the $p-SAT$ problem where given the formula $\phi$ and the parameter $k = $ number of variables in $\phi$ decide whether $\phi$ is satisfiable. This is clearly in FPT as the obvious brute force approach for formula of size $n$ with $k$ variables will take time $O(2^k . n)$. There are many real world problems which are in FPT like parameterized Vertex Cover which uses the kernelization technique to get a $O(k . n + 1.286^k)$ algorithm (due to Chen, Kanj and Jia, 2001). Note that the classical Vertex Cover is NP complete - yet this result shows that it
is not as `hard' as say Independent Set.
b) Parametric Intractability: Analogously to classical complexity theory, Downey and Fellows developed a completeness program for intractable parameterized problems. In order to compare the hardness of parameterized problems, we need a 2-dimensional reduction (called `parameterized-reduction'). Roughly, a language $L$ is parameterically reducible to $L'$ if there is an FPT algorithm that transforms $(x, k)$ to $(x', k')$ so that $(x, k) \in L$ iff $(x', k') \in L'$ and $k' = g(k)$ where $g$ is an unrestricted function. Note that Karp reductions are rarely parameterized reductions (e.g. Clique to Independent Set is one of the exceptions).
1) W[1]: The lowest class of parameterized intractability can be defined as the set of languages that are reducible to the Short-Turing Machine Acceptance problem (also called the $k$-step halting problem) where we want to determine for an input consisting of a nondeterministic Turing machine M and a string x, whether M has a computation path accepting x in at most k steps. In some sense, this is the parameterized analogue to the Turing machine acceptance problem: the basic generic NP-complete problem in classical complexity theory. Canonical problems include Independent Set (does $G$ have an independent of size $k$), Weighted 3SAT (does $\phi$ have a satisfying assignment of weight $k$) etc. We will also give an alternative definition of W[1] afterwards which is actually used for the basic results.
2) The W[t] hierarchy: Interestingly, while general $k-$SAT and 3SAT are equivalent for NP-hardness in classical complexity, there is no known parameterized reduction computing general satisfiability from 3-CNF formulae. This leads to a realization that the logical depth of a formula affects its parameterized complexity. Intuitively it is related to the number of alternations between unbounded fan-in AND and OR gates. Hence, we can base the hierarchy on the complexity of circuits required to check a solution.
The W[t] is based on Circuit-SAT problem (parameterized by the weight of input like for Weighted 3SAT) for family of circuits $F_{h, t}$ having:
1. Small gates: with constant fan-in
2. Large gates: with unbounded fan-in
and of depth $h$ (max. number of gates in the path) and weft (max. number of large gates in the path) at most $t$. Clearly, we have the containments:
$FPT \subseteq W[1] \subseteq W[2] \ldots $
It is not known whether the containments are strict or not. Note that W[1] can now be defined as the class that can be reduced to an parameterized Circuit SAT on the family of constant depth weft 1 circuits. Downey and Fellows (1995) proved that parameterized Independent Set is W[1]-complete. The hardness proof is very intricate and we omit it here. Note that using this definition it is easy to see why Independent Set is in W[1] (sketch):
1. Each vertex in $G$ corresponds to one input gate in the circuit
2. For every edge $(v, u) \in G$ build a small OR gate: $(1-v) \wedge (1-u))$
3. Output from small gates are given as input to a single large AND gate.
Interestingly unlike NP-hardness results, W[1]-hardness for the $k-$Halting Problem uses Independent Set completeness as intermediate results. There have been many papers demonstrating naturally occuring W[t]-complete problems like Dominating Set for W[2] etc.)- for an extensive list of already classified problems see the Downey and Fellows monograph.
Exponential Time Hypothesis (ETH)
ETH was first studied by Impagliazzo, Paturi and Zane 2001 and it states that 3SAT $\notin$ DTIME$(2^{o(n)})$, where $n$ is the number of variables. They also proved that the hypothesis is robust to analogous hypotheses for other NP-complete problems like Independent Set etc. In fact, they also showed that the ETH is independent of size measure: 3SAT is solvable in time $2^{o(n)}$ iff it is solvable in time $2^{o(m)}$ for input size $m$. Note that this suggests that weighted 3SAT should also be intractable for `any parameter'.
Connections with Classical complexity:
The above paragraph suggests that ETH and parameterized complexity might be related. In fact they are and Chen and Grohe recently proved that ETH and paramterized complexity theory are isomorphic by an explicit reduction preserving isomorphism. We don't show that here, instead we prove a simpler result that yet provides a strong link to ETH (proof is a version from Downey, Castro et. al 2003 and uses parameterized miniaturization and is different from the original proof by Abhramson, Downey and Fellows):
Theorem: If FPT = W[1], then ETH fails i.e. 3SAT $\in$ DTIME$(2^{o(n)})$.
Proof: We use the equivalent definition of ETH based on simple Circuit SAT. The idea is to capture the ETH perfectly in a parameterized complexity class. The starting point is the Mini-Circuit SAT problem which is a parameterized miniaturization of simple Circuit SAT:
Input: Positive integers $k$ and $n$, and a Boolean circuit of total size at most $k\log n$.
Decision: Does there exist any $x$ for which $C(x) = 1$.
Note that the parameter here is $k$. Also, trying all possible inputs gives a brute force $O(n^k)$ algorithm. Next we give a cruicial lemma due to Cai and Juedes 2001. It essentially fully characterizes the ETH with a complexity class in parameterized theory.
Lemma 1: Mini-Circuit SAT is in FPT iff ETH fails.
Proof: One direction follows from the brute force algorithm and noting that $2^{o(k \log n)}$ is a FPT function.
Now suppose we are given a boolean circuit $C$ of size $N$ and that Mini-Cirsuit SAT is solvable in FPT time $f(k)n^c$. Set $k = f^{-1}(N)$ and $n = 2^{N/k}$. In general $k = f^{-1}(N)$ will be some slowly growing function of $N$; so $N/k = o(N)$ and also $cN/k = o(N)$. Hence using the FPT algorithm for Mini-Circuit SAT we have a running time for Circuit SAT as: $f(f^{-1}(N)) (2^{N/k})^c = N 2^{cN/k} = 2^{cN/k + \log N} = 2^{o(N)}$.
Thus ETH fails. Proved.
Now lets define the complexity class MINI[1] to be the set of languages that are FPT reducible to Mini-Circuit SAT. It turns out that many $k\log n$ miniatures of familiar NP-complete problems are MINI[1] complete (Downey, Fellows et al. 2002). It is easy to see this because essentially all the usual NP-complete reductions of Circuit SAT to these problems work as FPT reductions because they were also linear size reductions.
We concentrate on the Mini-Independent Set problem. Surprisingly, it can be reduced to the usual parameterized Independent Set problem.
Lemma 2: Independent Set parameterized by the size of independent set is MINI[1]-hard.
Proof: We give a Turing reduction. Let graph $G = (V, E)$ be the miniature, for which we wish to determine whether $G$ has an independent set of size $r$ with $|V| \leq k\log n$. We can think of verices of $G$ as organized in $k$ blocks $V_1, V_2, \ldots, V_k$ each of size $\log n$. So for each possible way of writing $r$ as a sum of $k$ terms $r = r_1 + r_2 + r_3 + \ldots + r_k$ with each $r_i \leq \log n$, we have a turing reduction branch which represnts a commitment to choose $r_i$ vertices from the corresponding block $V_i$ to be in the independent set. The total number of branches are $(\log n)^k$ and again it is a FPT-function.
For each branch, we produce a graph $G'$ that has an independent set of size $k$ iff the miniature $G$ has an independent set of size $r$ distributed as indicated by the commitment made on that branch. The graph $G'$ consists of $k$ cliques with some cross edges. The ith clique consists of vertices in correspondence with the subsets of $V_i$ of size $r_i$. An edge connects a vertex $x$ in the ith clique and a vertex $y$ in the jth clique iff there is a vertex $u$ in the subset $S_x \subseteq V_i$ represented by $x$ and a vertex $v$ in the subset $S_y \subseteq V_j$ represented by $y$, such that $(u, v) \in E$.
From Lemma 2 we can conclude that $FPT \subseteq MINI[1] \subseteq W[1]$. We just need to observe now that if FPT = W[1] it implies $M[1] \subseteq FPT$. By Lemma 1, ETH fails.
Proved.
Discussion:
The above result hints that FPT vs W[1] problem is like the P vs NP problem of classical complexity theory. In fact it was also proved by Downey and Fellows that FPT $\neq$ W[1] implies P $\neq$ NP. But it is not known that if FPT $\neq$ W[1] implies anything about the rest of W[t] hierarchy. Importantly, practical intracbility of problems in NP, which are unlikely to be complete for NP, can be shown using W[t] hardness results. Parameterized complexity has deep connections to other complexity results as well. Just recently (a week back) Galesi and Lauria showed a connection with Proof Complexity based on randomized algorithms for W[t]-hard problems. The field is very active and there are many papers and surveys being published in it.
Lower bounds for locally decodable codes
In this blog, we will look at upper bounds on the rates of locally decodable codes and their relation to a problem called Private Information Retrieval. Some of the proofs presented here are unique in that they relate the complexity of classical algorithms to quantum algorithms and prove lower bounds on quantum algorithms!
Locally decodable codes are codes that can probabilistically recover a bits from corrupted codewords by querying a small number of bits, Hadamard code being a straightforward example. While an ideal code would have have rate, be resilient to large number of errors and would be locally decodable, it as been shown that it is not possible to do extremely well on all the criteria. Specifically, locally decodable codes imply that the code rate is $o(1)$.
The first paper in this direction ([Katz, Trevisan STOC'00]) showed that any locally decodable code with constant number of query bits has codewords of superlinear length. The main ideas behind these bounds is that smooth codes (codes which are queried uniformly for local decoding) are not much worse than non-smooth ones and that such smooth codes need super-linear encoding lengths.
Def 1: Suppose that a code maps $C:\{0,1\}^n\rightarrow \Sigma^m$. For fixed $\delta, \epsilon$ and integer $q$, $C$ is $(q,\delta,\epsilon)$-locally decodable code if there exists a probabilistic algorithm that recover an arbitrary bit $x_i$ of the input $x$ with probability at least $1/2+\epsilon$ from a corrupted codeword $y$ which is with in a distance $d(y,C(x))<\delta m$ of $C(x)$ after querying no more than $q$ indices of $y$.
Intuitively, smooth code words are those for which there exist probabilistic decoding
algorithms that query the codeword (roughly) uniformly, i.e., are not heavily biased
towards querying few indices of codeword. More formally:
Def 2: A codeword $C:\{0,1\}^n\rightarrow \Sigma^m$ is $(q,c,\epsilon)$-smooth
for fixed $c,\epsilon$ and integer $q$ if there exists a local decoding algorithm $A$
that can recover an aribitrary bit of input $x$ from codeword $C(x)$ with probability at least $1/2 + \epsilon$ using no more than $q$ indices of $y$ with the added condition that $\prob[A(x,i) \textrm{reads index} j] \leq c/m$. This definition does not require $A$ to recover bits from corrupted codewords.
Lemma 1: If $C$ is a $(q,\delta,\epsilon)$-locally decodable code, then it is also a $(q,q/\delta,\epsilon)$-smooth.
Proof sketch: Take the decoder for $C$ and identify all the locations that are queried with probability greater than $q/\delta m$ while trying to decode $m$. Construct a new decoder that just assumes the value $0$ for queries to all such locations. Since there no more than $\delta m$ of these, the new decodder can still recover from errors. This new decoder has the smoothness properties we want.
Lets start with case $q=1$, and see why it is not possible to construct locally decodable codes that encode inputs of length greater than a constant. Suppose that the algorithm is trying to decode the $i$-th bit $x_i$ by reading just a single index of the (corrupt) codeword $y$. Then, $\sum_{j\in[m]} Pr_x[A(C(x),i)=x_i | A(\cdot,i) ~\textrm{reads}~ j] Pr[A(\cdot,i) ~\textrm{reads}~ j]$
$ = Pr_x[A(C(x),i) = x_i] \geq 1/2 + \epsilon$,
beacuse of which there must exist a good index $j_1$ such that $Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ j_1] \geq 1/2 + \epsilon$. Suppose that the code got randomly corrupt at index $j_1$. Then $y_{j_1}$ has no correlation with $x_i$ and the algorithm $A$ can not infer any information from index $j_1$. But since $A$ can recover from $\delta m$ errors, there must exist another index $j_2$ which also \textit{good} in the sense that it can be queried to get information about $x_i$. By extending the above reasoning, we can see that we can also corrupt $y_{j_2}$ randomly along with $y_{j_1}$ and can expect to find another index $j_3$ with reasonable correlation to $x_i$. We can extend the same argument $\delta m$ times to reason that there are at least $\delta m$ indices $j\in[m]$ such that $A$ can compute $x_i$ from $C(x)_{j}$ with probability at least $1/2 + \epsilon$. This is true of all inputs $x$, which by the pigeon hole principle implies that there is at least one index $j'\in [m]$ such that at least $\delta n$ of $x_i$s can extracted with $1/2 + \epsilon$ probability by querying $y_{j'}$. Now this means that the code should put an enormous amount of information about the input in to one index of the output. Consider the following quantitive
lemma about limit of information recovery from a function:
Lemma 2: If $C:\{0,1\}^n\rightarrow R$, and there exists a $A$ such that $Pr[A(C(x),i)=x_i] \geq 1/2 + \epsilon$,
then $\log{|R|} \geq (1-{\mathsf H}(1/2 + \epsilon))n$, where ${\mathsf H}$ is the binary entropy function.
Idea: if a decoder has an any hope of recovering bits with reasonable probability,
the compression should not e too high.
Setting $|R| = |\Sigma|$, we have $n\leq \frac{\log{|\Sigma|}}{\delta(1-{\mathsf H}(1/2 + \epsilon))}$ (Therorem 3).
Now, consider the case $q>1$. A generalization of this same argument can be used
to show superlinear bounds on the length of the codewords as follows:
Suppose that $A$ is a $(q,c,\epsilon)$-smooth decoder for code $C$.
Similar to the $q=1$ case, call a set $S$ ($S\subseteq[m], |S|\leq q$) $\epsilon$-good for $i$ if $Pr[A(C(x),i) = x_i | A\textrm{reads} s] \geq 1/2 + \epsilon$. Define hypergraph $H_i$ with vertices labelled $[m]$ and egde set $E_i$ defined by the $\epsilon/2$-good sets for $i$. A matching of a hypergraph is an edge set with no common vertex and a vertex cover is a vertex set such that every edge contains at least one vertex from the vertex set.
Lemma 4: $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Proof:
$1/2 + \epsilon$
$\leq Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E_i] Pr [A(\cdot,i) ~\textrm{reads}~ E_i]$
$+ Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E^c_i] Pr {A(\cdot,i) ~\textrm{reads}~ E^c_i]$
$< Pr[A(\cdot,i) ~\textrm{reads}~ E_i] + (1/2 + \epsilon/2)(1-Pr[A(\cdot,i) \textrm{reads} E_i])$,
which implies that $Pr[A(\cdot,i) \textrm{reads from} E_i] > \epsilon$. If $P_e$ denotes the probability that $A(\cdot,i)$ reads $e\in E_i$, then we have $\epsilon < \sum_{e\in E_i} P_e$. Also for every $j\in [m]$, $\sum_{e\in E_i | j\in e} \leq c/m$
by smoothness condition. Now, if $V$ is vertex cover of $H_i$, $e\cap V \neq \emptyset$ for all $e\in E_i$. Putting this together with the earlier fact, we have $\epsilon < \sum_{s\in E_i | e\cap V\neq \emptyset} P_e \leq \sum_{j\in V}\sum_{e\in E_i | j\in e} P_e \leq |V|c/m$ which implies $|V| > \epsilon m/c$, and therefore, $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Say that a set $S$ hits matching $M_i$ if there is some set $s\subseteq S$ such that $s\in M_i$. The following lemma gives a bound on the number of vertices to be selected from $H_i$ so that selected set hits a constant fraction of $M_i$s.
Lemma 5: If $H$ is a hypergraph on $m$ vertices containing hyperedges of at most than $q$ vertices. Suppose $H$ has a matching of size $\gamma m$($\gamma < 1/q$). There exists $t=\Theta(\gamma^{-1/q}m^{(q-1)/q})$ so that for a randomly chosen (from $H$) set of $t$ elements, such that the probability of this set hitting an arbitrary matching is $3/4$.
Theorem 6: If $C:\{0,1\}^n\rightarrow \Sigma^m $ is a $(q,c,\epsilon)$-locally decodable codes, then:
$m\geq (\epsilon\delta/q^2)^{1/(q-1)}{\left(\frac{3n(1-{\mathsf H}(1/2 + \epsilon))}{4\log{|\Sigma|}}\right)}^{(\frac{q}{q-1})$.
Proof: Lemma 1 shows that $C$ is $(q,q/\delta, \epsilon)$-smooth. Lemma 4 shows that for every $i$, there exists a set $M_i$ consisting of disjoint set of size at most $q$ such, each $m\in M_i$ is $\epsilon/2$-good for $i$ and $|M_i| \geq \epsilon\delta m/q^2$. Lemma 5 says that there exists a set of $t = \Theta((\epsilon\delta/q^2)^{-1/q}m^{(q-1)/q}))$ indices from $[n]$ so that the values at these locations have enough information to help decode at least $3/4$ of the inputs bits with advantage at least $\epsilon /2$. Applying Lemma 2 which gives us a lower bound on $t$ proves the theorem.
While this is some start, this still leaves a large gap between existing locally decodable codes (which are exponential long) and the lower bound. Goldreich, Karloff, Schulman and Trevisan improve the lower bound for the specific case of linear codes and decoding algorithms that query only 2 bits. They show that in such a case, $m = 2^{\Omega(n)}$. Using new techniques (reduction to quantum queries), Kerenidis and Wolf (arXiv: quant-ph/0208062v2) show that any 2-query LDC (not necessarily linear) is exponentially long. Their work is as follows.
Quantum queries
---------------
In an earlier post, Matt talked about the fact that QM systems exist in a linear superposition of several states. The choice of basis vectors used to describe can of course be chosen according to convenience (the choice is usually the eigenstates of the measurement). A superposition of states does not mean a statistical mix of different states, rather it means that the state of the particle itself is a complex vector. Such a state is called a pure state (can be expressed as $|\phi\rangle = \sum_{i} \alpha_i |i\rangle$). The density matrix of a pure state (or qubit) is the outer product $M = |\phi\rangle\langle\phi|$, where $\langle\phi| = {|\phi\rangle}^{\dagger} = {{|\phi\rangle}^T}^{*}$ is the complex conjugate transpose of $|\phi\rangle$. Note that an arbitrary matrix $M$ need not in general be the density matrix of a pure state. Matrices of the form $\rho = \sum_{i}p_i|\phi_i\rangle\langle\phi_i|$ are called mixed states. As against a pure system, such mixed states are statistical ensembles of different pure states. Obviously, such a mixed state does not represent any pure state. However, the set of pure states of which $\rho$ is an ensemble is not uniquely determined by $\rho$ alone. $\rho$ represents both $\{(p_i,|v_i\rangle\}_{i}$ and $\{(p'_i,|v'_i\rangle}\}_{i}$ as long there is a unitary matrix $U$ such that $|v_i\rangle=\sum_{j} \sqrt{p_j/p'_i}U_{ij}|v_j\rangle$. Another context where mixed systems are useful for us is to describe a subsystem of an entangled state. For example, in the entangled state $(|00\rangle + |11\rangle)/\sqrt{2}$, the second qubit alone is not a pure system. It can be viewed as the mixed state $(|0\rangle\langle 0| + |1\rangle\langle 1|)/2$.
Measurement of a qubit can be thought of as projecting the qubit in to a subspace. Based on the set of subspaces we are trying to project our qubit in to, the qubit has different probability of collapsing in to these subspaces. We can generalise this notion to arbitrary positive operators (not just orthonormal projectors) and mixed states -- such a measurement system is called positive operator valued measurement (POVM). A POVM is a set of positive operators $E_i = M_i^{\dagger} M_i$ such that $\sum_{i=1}^{k} E_i = I$. When such a measurement is made on state $\rho$, the outcome is any one of the states $M_i\rho M_i^{\dagger}/Tr(M_i\rho M_i^{\dagger})$, the probabilities of the respective outcomes being $p_i = Tr(M_i\rho M_i^{\dagger})$.
In what follows we usually deal with $k$ of the form $2^m$ and systems of $m$ qubits.
If $B=\{\ket{\psi_i}\}$ is an orthonormal basis for the system, measuring in $B$-basis means using the POVM: $E_i = \ket{\psi_i}\bra{\psi_i}$. Then, the outcome of measuring a pure state $\ket{\phi}$ is simply $|\braket{\phi}{\psi}|^2$ as we expect.
Now, we are ready to define a quantum query:
A query to $j$-th bit of a length $m$ string $y$ is the unitary operation (quantum mechanics mandates that all state transformations are unitary):
$|c\rangle|j\rangle \mapsto (-1)^{c\cdot y_j}|c\rangle |j\rangle$.
Of course, the fact that we can apply this query transformation to superposition states to (indirectly) read off several values of the function is what makes quantum queries powerful (for that matter, this ability to manipulate superposition states is what gives extra power to quantum algorithms). The following lemma illustrates this power right away:
Lemma 7: Let $f:\{0,1\}^2\mapsto \{0,1\}$ and suppose that we have $a=a_1 a_2 \in \{0,1\}^2$ whose bits are to be queried in order to compute $f(a)$. There exists a quantum algorithm that uses just one query, and outputs $f(a)$ with probability exactly $11/14$, and outputs $1-f(a)$ otherwise.
Proof: Note that a classical algorithm has to query both bits before it can compute $f(a)$ with any accuracy. The quantum algorithm is as follows: query $(|01\rangle
+ |11\rangle + |12\rangle)/\sqrt{3}$. The output of the query would be:
$|\phi_a\rangle = (|01\rangle + (-1)^{a_1}|11\rangle + (-1)^{-a_2}|12\rangle)/\sqrt{3}$, the mutual phase shifts now holding information about $a_1, a_2$. To extract this information, we measure this state (i.e., cause it to collapse to basis) with the basis $\{|\psi_b\rangle\} (b\in \{0,1\}^2)$, where $|\psi_b\rangle = (|01\rangle + (-1)^{b_1}|01\rangle + (-1)^{b_2}|10\rangle + (-1)^{b_1+b_2}|11\rangle)/2$.
$|\phi_a\rangle$ gives outcome $a$ with probability $|\langle\phi_a|\psi_a\rangle|^2 = 3/4$, the other three outcomes being equally probable ($1/12$). Suppose that the measurement outcome is $m$. The following procedure does exactly what we want:
Definition: A $(q,\delta, \epsilon)$-LQDC (locally quantum-decoable code) is the same as a LDC except that we replace the probabilistic decoder with a quantum decoder and queries are quantum-queries that probe superpositions. Now we can use the earlier lemma to show that:
Theorem 8: A $(2,\delta,\epsilon)$-LDC is a $(1,\delta,4\epsilon/7)$-LQDC.
Proof: The idea is to replace the two classical queries with one quantum query (allowing for a bit of inaccuracy) based algorithm. The quantum decoder will choose two target bits the same way the classical decoder works, and instead of reading both bits and applying a function $f$, the LQDC does a quantum query on this two bits and outputs the value of $f$ on this bits (with $11/14$ accuracy). This suffices to get a $4\epsilon/7$ predictor.
Now we want to show lower bounds on 1-query LQDCs. The starting point as in the
case of 1-query LDCs is a theorem that gives a relation between the space
used to store information and fidelity of information reconstruction:
Theeorem 9: If $f$ maps $n$ bit strings to $m$-qubit states (mixed) states with
recovery probability at least $p$, then $m\geq (1 - {\mathsf H}(p))n$. This leads to:
Theeorem 10: If $C:\{0,1\}^n \rightarrow \{0,1\}^m$ is a $(1,\delta,\epsilon)$-LQDC, then $m\geq 2^{cn} - 1$, for $c=1-H(1/2 + \delta\epsilon/4)$.
Wehner and Wolf (arXiv: quant-ph/0403140v2) consider the case where the LDC encode
over larger alphabets and when the decoder uses only a part of the information from each query. Briet and Wolf (arXiv: quant-ph/0806.2101v1) further study the relation between LQDC and LDC and conclude that their powers are roughly the same for small constant number of queries.
Despite all this work, the best lower bounds on the overheads for >3 query LDC is only polynomial while known >3-query LDCs are eponentially long. This gap leaves a big open question.
LDCs and Private Information Retrieval (PIR)
--------------------------------------
The relation between LDCs and PIRs is close and worth mentioning since PIR systems yield some of the best LDCs. PIR is the problem of distribution a database among different servers so that a client can query a particular bit (by sending randomized queries to different servers) so that individual servers have almost no clue about the particular query bit that the client is accessing (information theoretic security, not computational). Here is the definition of a 2-server PIR system.
Def [GKST CCC'02]: A one-round, database size $n$, $(1-\delta)$ secure, 2-server PIR with recovery probability $p$, query size $q$, answer size $a$ is the set of algorithms $(Q,(S_1,S_2),D)$, where:
- $Q$ on input $i$ uses a randomness ($r$) to generate two $q$ bit long queries $(q_1,q_2)=Q(i,r)$. $q_1$ and $q_2$ are sent off to the two servers which use algorithms $S_1$ and $S_2$ to respond with length $a$-answers to these queries. The decoder algorithm $D$ uses $i,r$ and the answers of the servers to decode $x_i$. Given that the random string for input to $Q$ is selected from uniform distribution, the decoder $D$ should succeed with probability at least $p$. The secrecy condition is that the distribution of queries send to servers should be different for different target bits by more than a distance of $\delta$.
The same authors show that if there is a $1$-round, $(1-\delta)$-secure $2$-server PIR of database size $n$, query size $t$, answer size $a$ with recovery probability at least $1/2 + \epsilon$, then there exists a $(2,3,\epsilon-\delta)$-smooth code $C:\{0,1\}^n\rightarrow (\{0,1\}^a)^m$, where $m \leq 6.2^t$. This reduction can be used to show that a query sizes $\Theta(n^c) (c>0)$ are needed for PIR with constant answer sizes.
Locally decodable codes are codes that can probabilistically recover a bits from corrupted codewords by querying a small number of bits, Hadamard code being a straightforward example. While an ideal code would have have rate, be resilient to large number of errors and would be locally decodable, it as been shown that it is not possible to do extremely well on all the criteria. Specifically, locally decodable codes imply that the code rate is $o(1)$.
The first paper in this direction ([Katz, Trevisan STOC'00]) showed that any locally decodable code with constant number of query bits has codewords of superlinear length. The main ideas behind these bounds is that smooth codes (codes which are queried uniformly for local decoding) are not much worse than non-smooth ones and that such smooth codes need super-linear encoding lengths.
Def 1: Suppose that a code maps $C:\{0,1\}^n\rightarrow \Sigma^m$. For fixed $\delta, \epsilon$ and integer $q$, $C$ is $(q,\delta,\epsilon)$-locally decodable code if there exists a probabilistic algorithm that recover an arbitrary bit $x_i$ of the input $x$ with probability at least $1/2+\epsilon$ from a corrupted codeword $y$ which is with in a distance $d(y,C(x))<\delta m$ of $C(x)$ after querying no more than $q$ indices of $y$.
Intuitively, smooth code words are those for which there exist probabilistic decoding
algorithms that query the codeword (roughly) uniformly, i.e., are not heavily biased
towards querying few indices of codeword. More formally:
Def 2: A codeword $C:\{0,1\}^n\rightarrow \Sigma^m$ is $(q,c,\epsilon)$-smooth
for fixed $c,\epsilon$ and integer $q$ if there exists a local decoding algorithm $A$
that can recover an aribitrary bit of input $x$ from codeword $C(x)$ with probability at least $1/2 + \epsilon$ using no more than $q$ indices of $y$ with the added condition that $\prob[A(x,i) \textrm{reads index} j] \leq c/m$. This definition does not require $A$ to recover bits from corrupted codewords.
Lemma 1: If $C$ is a $(q,\delta,\epsilon)$-locally decodable code, then it is also a $(q,q/\delta,\epsilon)$-smooth.
Proof sketch: Take the decoder for $C$ and identify all the locations that are queried with probability greater than $q/\delta m$ while trying to decode $m$. Construct a new decoder that just assumes the value $0$ for queries to all such locations. Since there no more than $\delta m$ of these, the new decodder can still recover from errors. This new decoder has the smoothness properties we want.
Lets start with case $q=1$, and see why it is not possible to construct locally decodable codes that encode inputs of length greater than a constant. Suppose that the algorithm is trying to decode the $i$-th bit $x_i$ by reading just a single index of the (corrupt) codeword $y$. Then, $\sum_{j\in[m]} Pr_x[A(C(x),i)=x_i | A(\cdot,i) ~\textrm{reads}~ j] Pr[A(\cdot,i) ~\textrm{reads}~ j]$
$ = Pr_x[A(C(x),i) = x_i] \geq 1/2 + \epsilon$,
beacuse of which there must exist a good index $j_1$ such that $Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ j_1] \geq 1/2 + \epsilon$. Suppose that the code got randomly corrupt at index $j_1$. Then $y_{j_1}$ has no correlation with $x_i$ and the algorithm $A$ can not infer any information from index $j_1$. But since $A$ can recover from $\delta m$ errors, there must exist another index $j_2$ which also \textit{good} in the sense that it can be queried to get information about $x_i$. By extending the above reasoning, we can see that we can also corrupt $y_{j_2}$ randomly along with $y_{j_1}$ and can expect to find another index $j_3$ with reasonable correlation to $x_i$. We can extend the same argument $\delta m$ times to reason that there are at least $\delta m$ indices $j\in[m]$ such that $A$ can compute $x_i$ from $C(x)_{j}$ with probability at least $1/2 + \epsilon$. This is true of all inputs $x$, which by the pigeon hole principle implies that there is at least one index $j'\in [m]$ such that at least $\delta n$ of $x_i$s can extracted with $1/2 + \epsilon$ probability by querying $y_{j'}$. Now this means that the code should put an enormous amount of information about the input in to one index of the output. Consider the following quantitive
lemma about limit of information recovery from a function:
Lemma 2: If $C:\{0,1\}^n\rightarrow R$, and there exists a $A$ such that $Pr[A(C(x),i)=x_i] \geq 1/2 + \epsilon$,
then $\log{|R|} \geq (1-{\mathsf H}(1/2 + \epsilon))n$, where ${\mathsf H}$ is the binary entropy function.
Idea: if a decoder has an any hope of recovering bits with reasonable probability,
the compression should not e too high.
Setting $|R| = |\Sigma|$, we have $n\leq \frac{\log{|\Sigma|}}{\delta(1-{\mathsf H}(1/2 + \epsilon))}$ (Therorem 3).
Now, consider the case $q>1$. A generalization of this same argument can be used
to show superlinear bounds on the length of the codewords as follows:
Suppose that $A$ is a $(q,c,\epsilon)$-smooth decoder for code $C$.
Similar to the $q=1$ case, call a set $S$ ($S\subseteq[m], |S|\leq q$) $\epsilon$-good for $i$ if $Pr[A(C(x),i) = x_i | A\textrm{reads} s] \geq 1/2 + \epsilon$. Define hypergraph $H_i$ with vertices labelled $[m]$ and egde set $E_i$ defined by the $\epsilon/2$-good sets for $i$. A matching of a hypergraph is an edge set with no common vertex and a vertex cover is a vertex set such that every edge contains at least one vertex from the vertex set.
Lemma 4: $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Proof:
$1/2 + \epsilon$
$\leq Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E_i] Pr [A(\cdot,i) ~\textrm{reads}~ E_i]$
$+ Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E^c_i] Pr {A(\cdot,i) ~\textrm{reads}~ E^c_i]$
$< Pr[A(\cdot,i) ~\textrm{reads}~ E_i] + (1/2 + \epsilon/2)(1-Pr[A(\cdot,i) \textrm{reads} E_i])$,
which implies that $Pr[A(\cdot,i) \textrm{reads from} E_i] > \epsilon$. If $P_e$ denotes the probability that $A(\cdot,i)$ reads $e\in E_i$, then we have $\epsilon < \sum_{e\in E_i} P_e$. Also for every $j\in [m]$, $\sum_{e\in E_i | j\in e} \leq c/m$
by smoothness condition. Now, if $V$ is vertex cover of $H_i$, $e\cap V \neq \emptyset$ for all $e\in E_i$. Putting this together with the earlier fact, we have $\epsilon < \sum_{s\in E_i | e\cap V\neq \emptyset} P_e \leq \sum_{j\in V}\sum_{e\in E_i | j\in e} P_e \leq |V|c/m$ which implies $|V| > \epsilon m/c$, and therefore, $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Say that a set $S$ hits matching $M_i$ if there is some set $s\subseteq S$ such that $s\in M_i$. The following lemma gives a bound on the number of vertices to be selected from $H_i$ so that selected set hits a constant fraction of $M_i$s.
Lemma 5: If $H$ is a hypergraph on $m$ vertices containing hyperedges of at most than $q$ vertices. Suppose $H$ has a matching of size $\gamma m$($\gamma < 1/q$). There exists $t=\Theta(\gamma^{-1/q}m^{(q-1)/q})$ so that for a randomly chosen (from $H$) set of $t$ elements, such that the probability of this set hitting an arbitrary matching is $3/4$.
Theorem 6: If $C:\{0,1\}^n\rightarrow \Sigma^m $ is a $(q,c,\epsilon)$-locally decodable codes, then:
$m\geq (\epsilon\delta/q^2)^{1/(q-1)}{\left(\frac{3n(1-{\mathsf H}(1/2 + \epsilon))}{4\log{|\Sigma|}}\right)}^{(\frac{q}{q-1})$.
Proof: Lemma 1 shows that $C$ is $(q,q/\delta, \epsilon)$-smooth. Lemma 4 shows that for every $i$, there exists a set $M_i$ consisting of disjoint set of size at most $q$ such, each $m\in M_i$ is $\epsilon/2$-good for $i$ and $|M_i| \geq \epsilon\delta m/q^2$. Lemma 5 says that there exists a set of $t = \Theta((\epsilon\delta/q^2)^{-1/q}m^{(q-1)/q}))$ indices from $[n]$ so that the values at these locations have enough information to help decode at least $3/4$ of the inputs bits with advantage at least $\epsilon /2$. Applying Lemma 2 which gives us a lower bound on $t$ proves the theorem.
While this is some start, this still leaves a large gap between existing locally decodable codes (which are exponential long) and the lower bound. Goldreich, Karloff, Schulman and Trevisan improve the lower bound for the specific case of linear codes and decoding algorithms that query only 2 bits. They show that in such a case, $m = 2^{\Omega(n)}$. Using new techniques (reduction to quantum queries), Kerenidis and Wolf (arXiv: quant-ph/0208062v2) show that any 2-query LDC (not necessarily linear) is exponentially long. Their work is as follows.
Quantum queries
---------------
In an earlier post, Matt talked about the fact that QM systems exist in a linear superposition of several states. The choice of basis vectors used to describe can of course be chosen according to convenience (the choice is usually the eigenstates of the measurement). A superposition of states does not mean a statistical mix of different states, rather it means that the state of the particle itself is a complex vector. Such a state is called a pure state (can be expressed as $|\phi\rangle = \sum_{i} \alpha_i |i\rangle$). The density matrix of a pure state (or qubit) is the outer product $M = |\phi\rangle\langle\phi|$, where $\langle\phi| = {|\phi\rangle}^{\dagger} = {{|\phi\rangle}^T}^{*}$ is the complex conjugate transpose of $|\phi\rangle$. Note that an arbitrary matrix $M$ need not in general be the density matrix of a pure state. Matrices of the form $\rho = \sum_{i}p_i|\phi_i\rangle\langle\phi_i|$ are called mixed states. As against a pure system, such mixed states are statistical ensembles of different pure states. Obviously, such a mixed state does not represent any pure state. However, the set of pure states of which $\rho$ is an ensemble is not uniquely determined by $\rho$ alone. $\rho$ represents both $\{(p_i,|v_i\rangle\}_{i}$ and $\{(p'_i,|v'_i\rangle}\}_{i}$ as long there is a unitary matrix $U$ such that $|v_i\rangle=\sum_{j} \sqrt{p_j/p'_i}U_{ij}|v_j\rangle$. Another context where mixed systems are useful for us is to describe a subsystem of an entangled state. For example, in the entangled state $(|00\rangle + |11\rangle)/\sqrt{2}$, the second qubit alone is not a pure system. It can be viewed as the mixed state $(|0\rangle\langle 0| + |1\rangle\langle 1|)/2$.
Measurement of a qubit can be thought of as projecting the qubit in to a subspace. Based on the set of subspaces we are trying to project our qubit in to, the qubit has different probability of collapsing in to these subspaces. We can generalise this notion to arbitrary positive operators (not just orthonormal projectors) and mixed states -- such a measurement system is called positive operator valued measurement (POVM). A POVM is a set of positive operators $E_i = M_i^{\dagger} M_i$ such that $\sum_{i=1}^{k} E_i = I$. When such a measurement is made on state $\rho$, the outcome is any one of the states $M_i\rho M_i^{\dagger}/Tr(M_i\rho M_i^{\dagger})$, the probabilities of the respective outcomes being $p_i = Tr(M_i\rho M_i^{\dagger})$.
In what follows we usually deal with $k$ of the form $2^m$ and systems of $m$ qubits.
If $B=\{\ket{\psi_i}\}$ is an orthonormal basis for the system, measuring in $B$-basis means using the POVM: $E_i = \ket{\psi_i}\bra{\psi_i}$. Then, the outcome of measuring a pure state $\ket{\phi}$ is simply $|\braket{\phi}{\psi}|^2$ as we expect.
Now, we are ready to define a quantum query:
A query to $j$-th bit of a length $m$ string $y$ is the unitary operation (quantum mechanics mandates that all state transformations are unitary):
$|c\rangle|j\rangle \mapsto (-1)^{c\cdot y_j}|c\rangle |j\rangle$.
Of course, the fact that we can apply this query transformation to superposition states to (indirectly) read off several values of the function is what makes quantum queries powerful (for that matter, this ability to manipulate superposition states is what gives extra power to quantum algorithms). The following lemma illustrates this power right away:
Lemma 7: Let $f:\{0,1\}^2\mapsto \{0,1\}$ and suppose that we have $a=a_1 a_2 \in \{0,1\}^2$ whose bits are to be queried in order to compute $f(a)$. There exists a quantum algorithm that uses just one query, and outputs $f(a)$ with probability exactly $11/14$, and outputs $1-f(a)$ otherwise.
Proof: Note that a classical algorithm has to query both bits before it can compute $f(a)$ with any accuracy. The quantum algorithm is as follows: query $(|01\rangle
+ |11\rangle + |12\rangle)/\sqrt{3}$. The output of the query would be:
$|\phi_a\rangle = (|01\rangle + (-1)^{a_1}|11\rangle + (-1)^{-a_2}|12\rangle)/\sqrt{3}$, the mutual phase shifts now holding information about $a_1, a_2$. To extract this information, we measure this state (i.e., cause it to collapse to basis) with the basis $\{|\psi_b\rangle\} (b\in \{0,1\}^2)$, where $|\psi_b\rangle = (|01\rangle + (-1)^{b_1}|01\rangle + (-1)^{b_2}|10\rangle + (-1)^{b_1+b_2}|11\rangle)/2$.
$|\phi_a\rangle$ gives outcome $a$ with probability $|\langle\phi_a|\psi_a\rangle|^2 = 3/4$, the other three outcomes being equally probable ($1/12$). Suppose that the measurement outcome is $m$. The following procedure does exactly what we want:
- $f$ is a constant. Output the constant with probability $11/14$.
- $|f(1)^{-1}|=1$. If $f(m) = 1$, then output $1$. If $f(m)=0$, then output $0$ with probability $6/7$ and $1$ with probability $1/7$. Now, if $f(a) = 1$, then probability of the algorithm outputting $1$ is $(3/4).1 + (1/4)(1/7) = 11/14$. If $f(a) = 0$, then probability of algorithm outputting $0$ is $(11/12)(6/7) = 11/14$.
- $|f(1)^{-1}|=2$. Output $f(m)$ with probability $13/14$ and $1-f(m)$ with probability $1/14$.
- $|f(1)^{-1}|=3$. Similar to case $|f(1)^{-1}|=1$.
Definition: A $(q,\delta, \epsilon)$-LQDC (locally quantum-decoable code) is the same as a LDC except that we replace the probabilistic decoder with a quantum decoder and queries are quantum-queries that probe superpositions. Now we can use the earlier lemma to show that:
Theorem 8: A $(2,\delta,\epsilon)$-LDC is a $(1,\delta,4\epsilon/7)$-LQDC.
Proof: The idea is to replace the two classical queries with one quantum query (allowing for a bit of inaccuracy) based algorithm. The quantum decoder will choose two target bits the same way the classical decoder works, and instead of reading both bits and applying a function $f$, the LQDC does a quantum query on this two bits and outputs the value of $f$ on this bits (with $11/14$ accuracy). This suffices to get a $4\epsilon/7$ predictor.
Now we want to show lower bounds on 1-query LQDCs. The starting point as in the
case of 1-query LDCs is a theorem that gives a relation between the space
used to store information and fidelity of information reconstruction:
Theeorem 9: If $f$ maps $n$ bit strings to $m$-qubit states (mixed) states with
recovery probability at least $p$, then $m\geq (1 - {\mathsf H}(p))n$. This leads to:
Theeorem 10: If $C:\{0,1\}^n \rightarrow \{0,1\}^m$ is a $(1,\delta,\epsilon)$-LQDC, then $m\geq 2^{cn} - 1$, for $c=1-H(1/2 + \delta\epsilon/4)$.
Wehner and Wolf (arXiv: quant-ph/0403140v2) consider the case where the LDC encode
over larger alphabets and when the decoder uses only a part of the information from each query. Briet and Wolf (arXiv: quant-ph/0806.2101v1) further study the relation between LQDC and LDC and conclude that their powers are roughly the same for small constant number of queries.
Despite all this work, the best lower bounds on the overheads for >3 query LDC is only polynomial while known >3-query LDCs are eponentially long. This gap leaves a big open question.
LDCs and Private Information Retrieval (PIR)
--------------------------------------
The relation between LDCs and PIRs is close and worth mentioning since PIR systems yield some of the best LDCs. PIR is the problem of distribution a database among different servers so that a client can query a particular bit (by sending randomized queries to different servers) so that individual servers have almost no clue about the particular query bit that the client is accessing (information theoretic security, not computational). Here is the definition of a 2-server PIR system.
Def [GKST CCC'02]: A one-round, database size $n$, $(1-\delta)$ secure, 2-server PIR with recovery probability $p$, query size $q$, answer size $a$ is the set of algorithms $(Q,(S_1,S_2),D)$, where:
- $Q$ on input $i$ uses a randomness ($r$) to generate two $q$ bit long queries $(q_1,q_2)=Q(i,r)$. $q_1$ and $q_2$ are sent off to the two servers which use algorithms $S_1$ and $S_2$ to respond with length $a$-answers to these queries. The decoder algorithm $D$ uses $i,r$ and the answers of the servers to decode $x_i$. Given that the random string for input to $Q$ is selected from uniform distribution, the decoder $D$ should succeed with probability at least $p$. The secrecy condition is that the distribution of queries send to servers should be different for different target bits by more than a distance of $\delta$.
The same authors show that if there is a $1$-round, $(1-\delta)$-secure $2$-server PIR of database size $n$, query size $t$, answer size $a$ with recovery probability at least $1/2 + \epsilon$, then there exists a $(2,3,\epsilon-\delta)$-smooth code $C:\{0,1\}^n\rightarrow (\{0,1\}^a)^m$, where $m \leq 6.2^t$. This reduction can be used to show that a query sizes $\Theta(n^c) (c>0)$ are needed for PIR with constant answer sizes.
Kolmogorov Complexity and derandomization
Motivation
------------
The sets $R_{\mu}$ of random strings with high Kolmogorov complexity and bounded resources are good examples of sets with a lot of information content that is difficult to access. Many of these sets have been studied [BM97,Ko91,KC00] as possible examples of intractable sets that are not complete for any of the standard complexity classes. Based on [Al05], we now show that these sets can, in fact, be exploited by efficient reductions.
The completeness results are obtained via derandomization techniques, mostly relativizing hardness vs randomness tradeoffs in the contrapositive: If there exists a problem in complexity class $C$ that is hard when given oracle access to $A$, then there exists a pseudorandom generator secure against $A$ that is computable within $C$. We argue that no pseudorandom generator computable in $C$ can be secure against $R_{\mu}$, and thus every problem in $C$ is easy given oracle access to $R_{\mu}$. In other words, $C$ reduces to $R_{\mu}$.
Definitions
-----------
The computational model that we use is the multi-tape Turing machine with random-access to its input tape (the ideas work in any general model). Wlg, we pick a single universal machine $U$.
Kolmogorov complexity $C^A(x)$ is usually defined as the length of the shortest description $d\in \{0,1\}^*$ such that $U^A(d) = x$, where $U$ is a universal Turing machine with oracle access to $A$. The definition we use is slightly different: instead of requiring $U^A(d)$ to produce $x$, it should recognize the correct value of $x_i$ for every $i$.
We now define several resource-bounded Kolmogorov complexity measures:
Definition 1 [Kt,KT,KS]
$Kt^A(x) = \min\{ |d| + \log t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$
$KT^A(x) = \min\{ |d| + t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$
The only difference is that in $KT$ the time bound is given exponentially more weight. Considering space-bounded notions, too, will yield complete problems for space-bounded
classes:
$KS^A(x) = \min\{ |d| + s : U^A(d,i,b)$ accepts in $s$ space iff $x_i=b \}$
Both $Kt$ and $KS$ can be approximated in terms of $KT^A$, for appropriate oracle choices:
Theorem 2:
There exist a complete set $A$ for $E$, a complete set $B$ for $DSPACE(n)$, and a constant $c$ s.t.
$Kt(x)/c\leq KT^A(x)$
$KS(x)/c\leq KT^B(x)$
Proof sketch: Let $A \in E$ and let $x$ be given, s.t. $KT^A(x) = m$. Thus, there is a description $d_x$ of length $\leq m$, s.t. $U^A(d_x,i,b)$ accepts iff $x_i=b$ in time at most $m$. During computation, $U$ asks queries of length at most $m$. Since $A \in E$, each query can be answered in time $2^{O(m)}$.
Let $M$ denote the algorithm simulating the computation of $U^A(d_x,i,b)$ for every $i$ by directly computing the answers of $A$, then the description $d'_x= \langle M,d_x \rangle$ is sufficient for $U$ to compute $U(d'_x,i,b)$ in time $2^{O(m)}$. As $|d'_x|=m+O(1)$, we can conclude that $Kt(x) \leq m + O(1) + \log(2^{O(m)}) = O(m)$.
The proof for $KS$ is similar.
We focus on sets containing strings of high complexity, for various measures.
Definition 3:
For any Kolmogorov complexity measure $\mu$, define
$R_{\mu} = \{x : \mu(x) \geq |x|/2\}$.
The bound of $|x|/2$ is arbitrary. Essentially, all we need is that the set $R_{\mu}$ has polynomial density (it contains at least $2^n/n^k$ strings of each length $n$, for some $k$).
The following propositions are straightforward.
Proposition 4:
$R_{Kt} \in E$, $R_{KS} \in DSPACE(n)$, and $R_{KT} \in coNP$.
Proposition 5:
The sets $R_{Kt}$, $R_{KS}$ all have polynomial density.
Nonuniform Hardness Results
-------------------------------
We show that strings of high Kolmogorov complexity are very useful as oracles. We will argue that an appropriate set $R_{\mu}$ of Kolmogorov random strings can be used to distinguish the output of a pseudorandom generator $G_f$ from truly random strings. This in turn will enable us to efficiently reduce $f$ to $R_{\mu}$.
Recall that $A$ is PSPACE-robust if $PSPACE^A = P^A$ (machines are allowed to ask oracle
queries of only polynomial size). The complete sets for many large complexity classes (PSPACE, EXP, EXPSPACE) have this property, as well as the complete sets (under linear-time
reductions) for classes like $DSPACE(n)$ and $E$.
We will build a pseudorandom generator based on the Nisan-Wigderson paradigm [NW94]. [BFNW93] construct, for any $\epsilon > 0$, a variant $G^{BFNW}_f : \{0,1\}^{n^\epsilon}\rightarrow \{0,1\}^n$ such that for any $x$ of size $n^\epsilon$, the function is computable in space $O(n^\epsilon)$ given access to the Boolean function $f$ on inputs of size at most $n^\epsilon$. If $f$ is PSPACE-robust, there is a constant $c$ independent of $\epsilon$, such that each bit is computable in time $n^{\epsilon \cdot c}$ with oracle access to $f$. The following hardness versus randomness tradeoff holds.
Theorem 6 [BFNW93]
Let $f$ be a Boolean function, $\epsilon > 0$, and $G^{BFNW}_f$ be the pseudorandom generator described above. Let $T$ be a set and $p(n)$ a polynomial. If
$|P_{r\in U_n}[r \in T]-P_{x\in U_{n^\epsilon}}[G^{BFNW}_f(x)\in T]|\geq 1/p(n)$
for all large $n$, then there exists a polynomial size oracle circuit family $\{ C_n \}$ with oracle $T$ that computes $f$ and queries $T$ non-adaptively.
We use the notation $A\leq ^{P/poly}_{tt} B$ to denote that there exists a truth-table (i.e., nonadaptive) reduction from $A$ to $B$ that is computable by a family of polynomial-size circuits.
(A truth-table reduction is a reduction from one set of natural numbers to another. Truth-table reductions are more powerful than Turing reductions in a mathematical sense, since they provide finer equivalent class, but they are a weaker tool.)
Theorem 7:
Let $A$ be any PSPACE-robust set. Let $L$ be a set of polynomial density s.t. for every $x \in L$, $KT^A(x) > |x|^\gamma$ for some constant $\gamma > 0$. Then $A$ is reducible to $L$ via $\leq^{P/poly}_{tt}$ reductions.
Proof Sketch: Let $f$ be the characteristic function of $A$. Consider $G^{BFNW}_f$,
choose $\epsilon$ as follows. We know that every bit of $G^{BFNW}_f$ is computable in time $n^{\epsilon \cdot c}$ for some constant $c$ independent of $\epsilon$, given access to $A$. Set $\epsilon= \gamma/2c$ (wlg $c > 1$).
Any string in the range of $G^{BFNW}_f$ has small $KT^A$ complexity: Let $y = G^{BFNW}_f (x)$, for some $x$. On $x$, every bit of $G^{BFNW}_f$ is computable in time $n^{\gamma/2}$ with access to oracle $A$. Hence, $KT^A(y) \leq |x|+O(n^{\gamma/2} \log n)+O(1) \leq n^{\gamma}$. Thus, $L$ distinguishes the output of $G^{BFNW}_f$ from random, and by Theorem 6, $f$ is $\leq^{P/poly}_{tt}$ reducible to $L$.
By Theorem 2 and Proposition 5, we can apply Theorem 7 to
$\langle $the set $A$ from Theorem 2, $R_{Kt}\rangle $, and $\langle $the set $B$ from Theorem 2, $R_{KS}\rangle $.
($A$ and $B$ are clearly PSPACE-robust, since they are complete for EXP and PSPACE, respectively.) Combining this with Proposition 4, we get:
Corollary 8:
$R_{Kt}$ is complete for EXP under $\leq ^{P/poly}_{tt}$ reductions. $R_{KS}$ is complete for PSPACE under $\leq ^{P/poly}_{tt}$ reductions.
Those results also obtain natural examples that witness the difference in power of various reducibilities, as some of the sets which we show are complete under truth-table reductions are provably not complete under polynomial-time many-one reductions.
------------
The sets $R_{\mu}$ of random strings with high Kolmogorov complexity and bounded resources are good examples of sets with a lot of information content that is difficult to access. Many of these sets have been studied [BM97,Ko91,KC00] as possible examples of intractable sets that are not complete for any of the standard complexity classes. Based on [Al05], we now show that these sets can, in fact, be exploited by efficient reductions.
The completeness results are obtained via derandomization techniques, mostly relativizing hardness vs randomness tradeoffs in the contrapositive: If there exists a problem in complexity class $C$ that is hard when given oracle access to $A$, then there exists a pseudorandom generator secure against $A$ that is computable within $C$. We argue that no pseudorandom generator computable in $C$ can be secure against $R_{\mu}$, and thus every problem in $C$ is easy given oracle access to $R_{\mu}$. In other words, $C$ reduces to $R_{\mu}$.
Definitions
-----------
The computational model that we use is the multi-tape Turing machine with random-access to its input tape (the ideas work in any general model). Wlg, we pick a single universal machine $U$.
Kolmogorov complexity $C^A(x)$ is usually defined as the length of the shortest description $d\in \{0,1\}^*$ such that $U^A(d) = x$, where $U$ is a universal Turing machine with oracle access to $A$. The definition we use is slightly different: instead of requiring $U^A(d)$ to produce $x$, it should recognize the correct value of $x_i$ for every $i$.
We now define several resource-bounded Kolmogorov complexity measures:
Definition 1 [Kt,KT,KS]
$Kt^A(x) = \min\{ |d| + \log t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$
$KT^A(x) = \min\{ |d| + t : U^A(d,i,b)$ accepts in $t$ steps iff $x_i=b \}$
The only difference is that in $KT$ the time bound is given exponentially more weight. Considering space-bounded notions, too, will yield complete problems for space-bounded
classes:
$KS^A(x) = \min\{ |d| + s : U^A(d,i,b)$ accepts in $s$ space iff $x_i=b \}$
Both $Kt$ and $KS$ can be approximated in terms of $KT^A$, for appropriate oracle choices:
Theorem 2:
There exist a complete set $A$ for $E$, a complete set $B$ for $DSPACE(n)$, and a constant $c$ s.t.
$Kt(x)/c\leq KT^A(x)$
$KS(x)/c\leq KT^B(x)$
Proof sketch: Let $A \in E$ and let $x$ be given, s.t. $KT^A(x) = m$. Thus, there is a description $d_x$ of length $\leq m$, s.t. $U^A(d_x,i,b)$ accepts iff $x_i=b$ in time at most $m$. During computation, $U$ asks queries of length at most $m$. Since $A \in E$, each query can be answered in time $2^{O(m)}$.
Let $M$ denote the algorithm simulating the computation of $U^A(d_x,i,b)$ for every $i$ by directly computing the answers of $A$, then the description $d'_x= \langle M,d_x \rangle$ is sufficient for $U$ to compute $U(d'_x,i,b)$ in time $2^{O(m)}$. As $|d'_x|=m+O(1)$, we can conclude that $Kt(x) \leq m + O(1) + \log(2^{O(m)}) = O(m)$.
The proof for $KS$ is similar.
We focus on sets containing strings of high complexity, for various measures.
Definition 3:
For any Kolmogorov complexity measure $\mu$, define
$R_{\mu} = \{x : \mu(x) \geq |x|/2\}$.
The bound of $|x|/2$ is arbitrary. Essentially, all we need is that the set $R_{\mu}$ has polynomial density (it contains at least $2^n/n^k$ strings of each length $n$, for some $k$).
The following propositions are straightforward.
Proposition 4:
$R_{Kt} \in E$, $R_{KS} \in DSPACE(n)$, and $R_{KT} \in coNP$.
Proposition 5:
The sets $R_{Kt}$, $R_{KS}$ all have polynomial density.
Nonuniform Hardness Results
-------------------------------
We show that strings of high Kolmogorov complexity are very useful as oracles. We will argue that an appropriate set $R_{\mu}$ of Kolmogorov random strings can be used to distinguish the output of a pseudorandom generator $G_f$ from truly random strings. This in turn will enable us to efficiently reduce $f$ to $R_{\mu}$.
Recall that $A$ is PSPACE-robust if $PSPACE^A = P^A$ (machines are allowed to ask oracle
queries of only polynomial size). The complete sets for many large complexity classes (PSPACE, EXP, EXPSPACE) have this property, as well as the complete sets (under linear-time
reductions) for classes like $DSPACE(n)$ and $E$.
We will build a pseudorandom generator based on the Nisan-Wigderson paradigm [NW94]. [BFNW93] construct, for any $\epsilon > 0$, a variant $G^{BFNW}_f : \{0,1\}^{n^\epsilon}\rightarrow \{0,1\}^n$ such that for any $x$ of size $n^\epsilon$, the function is computable in space $O(n^\epsilon)$ given access to the Boolean function $f$ on inputs of size at most $n^\epsilon$. If $f$ is PSPACE-robust, there is a constant $c$ independent of $\epsilon$, such that each bit is computable in time $n^{\epsilon \cdot c}$ with oracle access to $f$. The following hardness versus randomness tradeoff holds.
Theorem 6 [BFNW93]
Let $f$ be a Boolean function, $\epsilon > 0$, and $G^{BFNW}_f$ be the pseudorandom generator described above. Let $T$ be a set and $p(n)$ a polynomial. If
$|P_{r\in U_n}[r \in T]-P_{x\in U_{n^\epsilon}}[G^{BFNW}_f(x)\in T]|\geq 1/p(n)$
for all large $n$, then there exists a polynomial size oracle circuit family $\{ C_n \}$ with oracle $T$ that computes $f$ and queries $T$ non-adaptively.
We use the notation $A\leq ^{P/poly}_{tt} B$ to denote that there exists a truth-table (i.e., nonadaptive) reduction from $A$ to $B$ that is computable by a family of polynomial-size circuits.
(A truth-table reduction is a reduction from one set of natural numbers to another. Truth-table reductions are more powerful than Turing reductions in a mathematical sense, since they provide finer equivalent class, but they are a weaker tool.)
Theorem 7:
Let $A$ be any PSPACE-robust set. Let $L$ be a set of polynomial density s.t. for every $x \in L$, $KT^A(x) > |x|^\gamma$ for some constant $\gamma > 0$. Then $A$ is reducible to $L$ via $\leq^{P/poly}_{tt}$ reductions.
Proof Sketch: Let $f$ be the characteristic function of $A$. Consider $G^{BFNW}_f$,
choose $\epsilon$ as follows. We know that every bit of $G^{BFNW}_f$ is computable in time $n^{\epsilon \cdot c}$ for some constant $c$ independent of $\epsilon$, given access to $A$. Set $\epsilon= \gamma/2c$ (wlg $c > 1$).
Any string in the range of $G^{BFNW}_f$ has small $KT^A$ complexity: Let $y = G^{BFNW}_f (x)$, for some $x$. On $x$, every bit of $G^{BFNW}_f$ is computable in time $n^{\gamma/2}$ with access to oracle $A$. Hence, $KT^A(y) \leq |x|+O(n^{\gamma/2} \log n)+O(1) \leq n^{\gamma}$. Thus, $L$ distinguishes the output of $G^{BFNW}_f$ from random, and by Theorem 6, $f$ is $\leq^{P/poly}_{tt}$ reducible to $L$.
By Theorem 2 and Proposition 5, we can apply Theorem 7 to
$\langle $the set $A$ from Theorem 2, $R_{Kt}\rangle $, and $\langle $the set $B$ from Theorem 2, $R_{KS}\rangle $.
($A$ and $B$ are clearly PSPACE-robust, since they are complete for EXP and PSPACE, respectively.) Combining this with Proposition 4, we get:
Corollary 8:
$R_{Kt}$ is complete for EXP under $\leq ^{P/poly}_{tt}$ reductions. $R_{KS}$ is complete for PSPACE under $\leq ^{P/poly}_{tt}$ reductions.
Those results also obtain natural examples that witness the difference in power of various reducibilities, as some of the sets which we show are complete under truth-table reductions are provably not complete under polynomial-time many-one reductions.
Subscribe to:
Posts (Atom)