## Friday, May 1, 2009

### Lower bounds for locally decodable codes

In this blog, we will look at upper bounds on the rates of locally decodable codes and their relation to a problem called Private Information Retrieval. Some of the proofs presented here are unique in that they relate the complexity of classical algorithms to quantum algorithms and prove lower bounds on quantum algorithms!

Locally decodable codes are codes that can probabilistically recover a bits from corrupted codewords by querying a small number of bits, Hadamard code being a straightforward example. While an ideal code would have have rate, be resilient to large number of errors and would be locally decodable, it as been shown that it is not possible to do extremely well on all the criteria. Specifically, locally decodable codes imply that the code rate is $o(1)$.

The first paper in this direction ([Katz, Trevisan STOC'00]) showed that any locally decodable code with constant number of query bits has codewords of superlinear length. The main ideas behind these bounds is that smooth codes (codes which are queried uniformly for local decoding) are not much worse than non-smooth ones and that such smooth codes need super-linear encoding lengths.

Def 1: Suppose that a code maps $C:\{0,1\}^n\rightarrow \Sigma^m$. For fixed $\delta, \epsilon$ and integer $q$, $C$ is $(q,\delta,\epsilon)$-locally decodable code if there exists a probabilistic algorithm that recover an arbitrary bit $x_i$ of the input $x$ with probability at least $1/2+\epsilon$ from a corrupted codeword $y$ which is with in a distance $d(y,C(x))<\delta m$ of $C(x)$ after querying no more than $q$ indices of $y$.

Intuitively, smooth code words are those for which there exist probabilistic decoding
algorithms that query the codeword (roughly) uniformly, i.e., are not heavily biased
towards querying few indices of codeword. More formally:

Def 2: A codeword $C:\{0,1\}^n\rightarrow \Sigma^m$ is $(q,c,\epsilon)$-smooth
for fixed $c,\epsilon$ and integer $q$ if there exists a local decoding algorithm $A$
that can recover an aribitrary bit of input $x$ from codeword $C(x)$ with probability at least $1/2 + \epsilon$ using no more than $q$ indices of $y$ with the added condition that $\prob[A(x,i) \textrm{reads index} j] \leq c/m$. This definition does not require $A$ to recover bits from corrupted codewords.

Lemma 1: If $C$ is a $(q,\delta,\epsilon)$-locally decodable code, then it is also a $(q,q/\delta,\epsilon)$-smooth.
Proof sketch: Take the decoder for $C$ and identify all the locations that are queried with probability greater than $q/\delta m$ while trying to decode $m$. Construct a new decoder that just assumes the value $0$ for queries to all such locations. Since there no more than $\delta m$ of these, the new decodder can still recover from errors. This new decoder has the smoothness properties we want.

Lets start with case $q=1$, and see why it is not possible to construct locally decodable codes that encode inputs of length greater than a constant. Suppose that the algorithm is trying to decode the $i$-th bit $x_i$ by reading just a single index of the (corrupt) codeword $y$. Then, $\sum_{j\in[m]} Pr_x[A(C(x),i)=x_i | A(\cdot,i) ~\textrm{reads}~ j] Pr[A(\cdot,i) ~\textrm{reads}~ j]$
$= Pr_x[A(C(x),i) = x_i] \geq 1/2 + \epsilon$,

beacuse of which there must exist a good index $j_1$ such that $Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ j_1] \geq 1/2 + \epsilon$. Suppose that the code got randomly corrupt at index $j_1$. Then $y_{j_1}$ has no correlation with $x_i$ and the algorithm $A$ can not infer any information from index $j_1$. But since $A$ can recover from $\delta m$ errors, there must exist another index $j_2$ which also \textit{good} in the sense that it can be queried to get information about $x_i$. By extending the above reasoning, we can see that we can also corrupt $y_{j_2}$ randomly along with $y_{j_1}$ and can expect to find another index $j_3$ with reasonable correlation to $x_i$. We can extend the same argument $\delta m$ times to reason that there are at least $\delta m$ indices $j\in[m]$ such that $A$ can compute $x_i$ from $C(x)_{j}$ with probability at least $1/2 + \epsilon$. This is true of all inputs $x$, which by the pigeon hole principle implies that there is at least one index $j'\in [m]$ such that at least $\delta n$ of $x_i$s can extracted with $1/2 + \epsilon$ probability by querying $y_{j'}$. Now this means that the code should put an enormous amount of information about the input in to one index of the output. Consider the following quantitive
lemma about limit of information recovery from a function:

Lemma 2: If $C:\{0,1\}^n\rightarrow R$, and there exists a $A$ such that $Pr[A(C(x),i)=x_i] \geq 1/2 + \epsilon$,
then $\log{|R|} \geq (1-{\mathsf H}(1/2 + \epsilon))n$, where ${\mathsf H}$ is the binary entropy function.
Idea: if a decoder has an any hope of recovering bits with reasonable probability,
the compression should not e too high.

Setting $|R| = |\Sigma|$, we have $n\leq \frac{\log{|\Sigma|}}{\delta(1-{\mathsf H}(1/2 + \epsilon))}$ (Therorem 3).

Now, consider the case $q>1$. A generalization of this same argument can be used
to show superlinear bounds on the length of the codewords as follows:

Suppose that $A$ is a $(q,c,\epsilon)$-smooth decoder for code $C$.
Similar to the $q=1$ case, call a set $S$ ($S\subseteq[m], |S|\leq q$) $\epsilon$-good for $i$ if $Pr[A(C(x),i) = x_i | A\textrm{reads} s] \geq 1/2 + \epsilon$. Define hypergraph $H_i$ with vertices labelled $[m]$ and egde set $E_i$ defined by the $\epsilon/2$-good sets for $i$. A matching of a hypergraph is an edge set with no common vertex and a vertex cover is a vertex set such that every edge contains at least one vertex from the vertex set.

Lemma 4: $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.
Proof:
$1/2 + \epsilon$
$\leq Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E_i] Pr [A(\cdot,i) ~\textrm{reads}~ E_i]$
$+ Pr_x[A(C(x),i) = x_i | A(\cdot,i) ~\textrm{reads}~ E^c_i] Pr {A(\cdot,i) ~\textrm{reads}~ E^c_i]$
$< Pr[A(\cdot,i) ~\textrm{reads}~ E_i] + (1/2 + \epsilon/2)(1-Pr[A(\cdot,i) \textrm{reads} E_i])$,
which implies that $Pr[A(\cdot,i) \textrm{reads from} E_i] > \epsilon$. If $P_e$ denotes the probability that $A(\cdot,i)$ reads $e\in E_i$, then we have $\epsilon < \sum_{e\in E_i} P_e$. Also for every $j\in [m]$, $\sum_{e\in E_i | j\in e} \leq c/m$
by smoothness condition. Now, if $V$ is vertex cover of $H_i$, $e\cap V \neq \emptyset$ for all $e\in E_i$. Putting this together with the earlier fact, we have $\epsilon < \sum_{s\in E_i | e\cap V\neq \emptyset} P_e \leq \sum_{j\in V}\sum_{e\in E_i | j\in e} P_e \leq |V|c/m$ which implies $|V| > \epsilon m/c$, and therefore, $H_i$ has a matching $M_i$ of size at least $\epsilon m/cq$.

Say that a set $S$ hits matching $M_i$ if there is some set $s\subseteq S$ such that $s\in M_i$. The following lemma gives a bound on the number of vertices to be selected from $H_i$ so that selected set hits a constant fraction of $M_i$s.

Lemma 5: If $H$ is a hypergraph on $m$ vertices containing hyperedges of at most than $q$ vertices. Suppose $H$ has a matching of size $\gamma m$($\gamma < 1/q$). There exists $t=\Theta(\gamma^{-1/q}m^{(q-1)/q})$ so that for a randomly chosen (from $H$) set of $t$ elements, such that the probability of this set hitting an arbitrary matching is $3/4$.

Theorem 6: If $C:\{0,1\}^n\rightarrow \Sigma^m$ is a $(q,c,\epsilon)$-locally decodable codes, then:
$m\geq (\epsilon\delta/q^2)^{1/(q-1)}{\left(\frac{3n(1-{\mathsf H}(1/2 + \epsilon))}{4\log{|\Sigma|}}\right)}^{(\frac{q}{q-1})$.
Proof: Lemma 1 shows that $C$ is $(q,q/\delta, \epsilon)$-smooth. Lemma 4 shows that for every $i$, there exists a set $M_i$ consisting of disjoint set of size at most $q$ such, each $m\in M_i$ is $\epsilon/2$-good for $i$ and $|M_i| \geq \epsilon\delta m/q^2$. Lemma 5 says that there exists a set of $t = \Theta((\epsilon\delta/q^2)^{-1/q}m^{(q-1)/q}))$ indices from $[n]$ so that the values at these locations have enough information to help decode at least $3/4$ of the inputs bits with advantage at least $\epsilon /2$. Applying Lemma 2 which gives us a lower bound on $t$ proves the theorem.

While this is some start, this still leaves a large gap between existing locally decodable codes (which are exponential long) and the lower bound. Goldreich, Karloff, Schulman and Trevisan improve the lower bound for the specific case of linear codes and decoding algorithms that query only 2 bits. They show that in such a case, $m = 2^{\Omega(n)}$. Using new techniques (reduction to quantum queries), Kerenidis and Wolf (arXiv: quant-ph/0208062v2) show that any 2-query LDC (not necessarily linear) is exponentially long. Their work is as follows.

Quantum queries
---------------

In an earlier post, Matt talked about the fact that QM systems exist in a linear superposition of several states. The choice of basis vectors used to describe can of course be chosen according to convenience (the choice is usually the eigenstates of the measurement). A superposition of states does not mean a statistical mix of different states, rather it means that the state of the particle itself is a complex vector. Such a state is called a pure state (can be expressed as $|\phi\rangle = \sum_{i} \alpha_i |i\rangle$). The density matrix of a pure state (or qubit) is the outer product $M = |\phi\rangle\langle\phi|$, where $\langle\phi| = {|\phi\rangle}^{\dagger} = {{|\phi\rangle}^T}^{*}$ is the complex conjugate transpose of $|\phi\rangle$. Note that an arbitrary matrix $M$ need not in general be the density matrix of a pure state. Matrices of the form $\rho = \sum_{i}p_i|\phi_i\rangle\langle\phi_i|$ are called mixed states. As against a pure system, such mixed states are statistical ensembles of different pure states. Obviously, such a mixed state does not represent any pure state. However, the set of pure states of which $\rho$ is an ensemble is not uniquely determined by $\rho$ alone. $\rho$ represents both $\{(p_i,|v_i\rangle\}_{i}$ and $\{(p'_i,|v'_i\rangle}\}_{i}$ as long there is a unitary matrix $U$ such that $|v_i\rangle=\sum_{j} \sqrt{p_j/p'_i}U_{ij}|v_j\rangle$. Another context where mixed systems are useful for us is to describe a subsystem of an entangled state. For example, in the entangled state $(|00\rangle + |11\rangle)/\sqrt{2}$, the second qubit alone is not a pure system. It can be viewed as the mixed state $(|0\rangle\langle 0| + |1\rangle\langle 1|)/2$.

Measurement of a qubit can be thought of as projecting the qubit in to a subspace. Based on the set of subspaces we are trying to project our qubit in to, the qubit has different probability of collapsing in to these subspaces. We can generalise this notion to arbitrary positive operators (not just orthonormal projectors) and mixed states -- such a measurement system is called positive operator valued measurement (POVM). A POVM is a set of positive operators $E_i = M_i^{\dagger} M_i$ such that $\sum_{i=1}^{k} E_i = I$. When such a measurement is made on state $\rho$, the outcome is any one of the states $M_i\rho M_i^{\dagger}/Tr(M_i\rho M_i^{\dagger})$, the probabilities of the respective outcomes being $p_i = Tr(M_i\rho M_i^{\dagger})$.

In what follows we usually deal with $k$ of the form $2^m$ and systems of $m$ qubits.
If $B=\{\ket{\psi_i}\}$ is an orthonormal basis for the system, measuring in $B$-basis means using the POVM: $E_i = \ket{\psi_i}\bra{\psi_i}$. Then, the outcome of measuring a pure state $\ket{\phi}$ is simply $|\braket{\phi}{\psi}|^2$ as we expect.

Now, we are ready to define a quantum query:
A query to $j$-th bit of a length $m$ string $y$ is the unitary operation (quantum mechanics mandates that all state transformations are unitary):
$|c\rangle|j\rangle \mapsto (-1)^{c\cdot y_j}|c\rangle |j\rangle$.
Of course, the fact that we can apply this query transformation to superposition states to (indirectly) read off several values of the function is what makes quantum queries powerful (for that matter, this ability to manipulate superposition states is what gives extra power to quantum algorithms). The following lemma illustrates this power right away:

Lemma 7: Let $f:\{0,1\}^2\mapsto \{0,1\}$ and suppose that we have $a=a_1 a_2 \in \{0,1\}^2$ whose bits are to be queried in order to compute $f(a)$. There exists a quantum algorithm that uses just one query, and outputs $f(a)$ with probability exactly $11/14$, and outputs $1-f(a)$ otherwise.
Proof: Note that a classical algorithm has to query both bits before it can compute $f(a)$ with any accuracy. The quantum algorithm is as follows: query $(|01\rangle + |11\rangle + |12\rangle)/\sqrt{3}$. The output of the query would be:
$|\phi_a\rangle = (|01\rangle + (-1)^{a_1}|11\rangle + (-1)^{-a_2}|12\rangle)/\sqrt{3}$, the mutual phase shifts now holding information about $a_1, a_2$. To extract this information, we measure this state (i.e., cause it to collapse to basis) with the basis $\{|\psi_b\rangle\} (b\in \{0,1\}^2)$, where $|\psi_b\rangle = (|01\rangle + (-1)^{b_1}|01\rangle + (-1)^{b_2}|10\rangle + (-1)^{b_1+b_2}|11\rangle)/2$.

$|\phi_a\rangle$ gives outcome $a$ with probability $|\langle\phi_a|\psi_a\rangle|^2 = 3/4$, the other three outcomes being equally probable ($1/12$). Suppose that the measurement outcome is $m$. The following procedure does exactly what we want:

• $f$ is a constant. Output the constant with probability $11/14$.

• $|f(1)^{-1}|=1$. If $f(m) = 1$, then output $1$. If $f(m)=0$, then output $0$ with probability $6/7$ and $1$ with probability $1/7$. Now, if $f(a) = 1$, then probability of the algorithm outputting $1$ is $(3/4).1 + (1/4)(1/7) = 11/14$. If $f(a) = 0$, then probability of algorithm outputting $0$ is $(11/12)(6/7) = 11/14$.

• $|f(1)^{-1}|=2$. Output $f(m)$ with probability $13/14$ and $1-f(m)$ with probability $1/14$.

• $|f(1)^{-1}|=3$. Similar to case $|f(1)^{-1}|=1$.

Definition: A $(q,\delta, \epsilon)$-LQDC (locally quantum-decoable code) is the same as a LDC except that we replace the probabilistic decoder with a quantum decoder and queries are quantum-queries that probe superpositions. Now we can use the earlier lemma to show that:

Theorem 8: A $(2,\delta,\epsilon)$-LDC is a $(1,\delta,4\epsilon/7)$-LQDC.
Proof: The idea is to replace the two classical queries with one quantum query (allowing for a bit of inaccuracy) based algorithm. The quantum decoder will choose two target bits the same way the classical decoder works, and instead of reading both bits and applying a function $f$, the LQDC does a quantum query on this two bits and outputs the value of $f$ on this bits (with $11/14$ accuracy). This suffices to get a $4\epsilon/7$ predictor.

Now we want to show lower bounds on 1-query LQDCs. The starting point as in the
case of 1-query LDCs is a theorem that gives a relation between the space
used to store information and fidelity of information reconstruction:

Theeorem 9: If $f$ maps $n$ bit strings to $m$-qubit states (mixed) states with
recovery probability at least $p$, then $m\geq (1 - {\mathsf H}(p))n$. This leads to:

Theeorem 10: If $C:\{0,1\}^n \rightarrow \{0,1\}^m$ is a $(1,\delta,\epsilon)$-LQDC, then $m\geq 2^{cn} - 1$, for $c=1-H(1/2 + \delta\epsilon/4)$.

Wehner and Wolf (arXiv: quant-ph/0403140v2) consider the case where the LDC encode
over larger alphabets and when the decoder uses only a part of the information from each query. Briet and Wolf (arXiv: quant-ph/0806.2101v1) further study the relation between LQDC and LDC and conclude that their powers are roughly the same for small constant number of queries.

Despite all this work, the best lower bounds on the overheads for >3 query LDC is only polynomial while known >3-query LDCs are eponentially long. This gap leaves a big open question.

LDCs and Private Information Retrieval (PIR)
--------------------------------------
The relation between LDCs and PIRs is close and worth mentioning since PIR systems yield some of the best LDCs. PIR is the problem of distribution a database among different servers so that a client can query a particular bit (by sending randomized queries to different servers) so that individual servers have almost no clue about the particular query bit that the client is accessing (information theoretic security, not computational). Here is the definition of a 2-server PIR system.

Def [GKST CCC'02]: A one-round, database size $n$, $(1-\delta)$ secure, 2-server PIR with recovery probability $p$, query size $q$, answer size $a$ is the set of algorithms $(Q,(S_1,S_2),D)$, where:
- $Q$ on input $i$ uses a randomness ($r$) to generate two $q$ bit long queries $(q_1,q_2)=Q(i,r)$. $q_1$ and $q_2$ are sent off to the two servers which use algorithms $S_1$ and $S_2$ to respond with length $a$-answers to these queries. The decoder algorithm $D$ uses $i,r$ and the answers of the servers to decode $x_i$. Given that the random string for input to $Q$ is selected from uniform distribution, the decoder $D$ should succeed with probability at least $p$. The secrecy condition is that the distribution of queries send to servers should be different for different target bits by more than a distance of $\delta$.

The same authors show that if there is a $1$-round, $(1-\delta)$-secure $2$-server PIR of database size $n$, query size $t$, answer size $a$ with recovery probability at least $1/2 + \epsilon$, then there exists a $(2,3,\epsilon-\delta)$-smooth code $C:\{0,1\}^n\rightarrow (\{0,1\}^a)^m$, where $m \leq 6.2^t$. This reduction can be used to show that a query sizes $\Theta(n^c) (c>0)$ are needed for PIR with constant answer sizes.