PhD thesis of Troy Lee, University of Amsterdam, January 2006

[ pdf ] (1096 KB)

In the early 1960's, independently and within a few years of each other, Solomonoff, Kolmogorov, and Chaitin all developed an elegant way to capture when a sequence of events (viewed as a string with letters 0 and 1) is random. They proposed the following definition: the complexity of a string is the length of a shortest description of that string. Here a description can be thought of as a computer program. We will refer to this measure as the Kolmogorov complexity of the string, and write $C(x\pipe y)$ for the length of a shortest program for $x$ which is allowed to make use of the advice string $y$. While the string $01010101010101010101010101010101010101010101010101$ has 50 symbols, it has a much shorter description saying ``write 25 times the string 01''. On the other hand, a random string will have no structure which would allow a description of length shorter than the length of the string itself. Thus we call a string random if its Kolmogorov complexity is at least as large as its length.

Far beyond its initial purpose, Kolmogorov complexity has become an important tool in theoretical computer science, witnessing many diverse applications. Almost all of the applications use at least one of the following four fundamental theorems, which we call the `four pillars' of Kolmogorov complexity:

- Incompressibility: There is a random string of length $n$, for each $n$.
- Language compression: Any element from a computable set $A$ can be given a description of size about $\log \card{A}$.
- Source compression: Any element $x$ in the support of a computable probability distribution $P$ can be given a description of size about $-\log P(x)$.
- Symmetry of information: The information which a string $x$ contains about a string $y$ is about the same as that which $y$ contains about $x$. In symbols: $C(x) - C(x\pipe y) = C(y)- C(y\pipe x)$.

The first pillar holds unchanged in the resource bounded setting, as adding restrictions to a program can only make it harder to describe a string.

Things get much more interesting beginning with the second pillar. The most natural resource bounded analog of the second pillar is the statement: every element $x$ in a set $A$ decidable in polynomial time has a description of length $\log \card{A}$ from which $x$ can be generated in polynomial time. This statement is unlikely to hold, however, as it implies the polynomial hierarchy collapses. We show that going up one step higher in the polynomial hierarchy, however, we can find an analog of the second pillar. That is, we are able to show every element $x$ in a set $A$ which can be decided in nondeterministic polynomial time can be given a description of length about $\log \card{A}$ from which $x$ can be generated in nondeterministic polynomial time.

The most natural analogue of the third pillar would be: any element in the support of a polynomial time computable probability distribution $P$ can be given a description of length about $-\log P(x)$. We show that such a statement implies that randomized algorithms (those which can make decisions based on the outcome of a coin flip) can be simulated by deterministic algorithms (which cannot flip coins). Precisely, we show this implies $\BPP \ne \EXP$. Recently, Antunes and Fortnow were able to prove a weak converse to this statement: under a derandomization assumption the polynomial time version of the third pillar holds. With these results we have a fairly complete picture of the third pillar in the resource bounded setting.

Finally, we turn to symmetry of information which is perhaps the most tricky of the four pillars in the resource bounded domain. To prove symmetry of information seems to require the ability to do both language compression and source compression. We are only able to prove a weaker version of symmetry of information which says that the polynomial time complexity of the pair $(x,y)$ is larger than the randomized nondeterministic complexity of $x$ plus the randomized nondeterministic complexity of $y$ given $x$. This result is tight with respect to relativizing techniques---we give an oracle where even the nondeterministic complexity of $(x,y)$ is nearly twice as large as the nondeterministic complexity of $x$ plus the nondeterministic complexity of $y$ given $x$.

We give a new algebraic approach to proving formula size lower bounds. We are not able to improve on the best $n^3$ lower bound; we are, however, able to simultaneously generalize several techniques from the literature and show them as part of an overarching theory. We also give some concrete examples of functions for which our new method is able to provably do better than previous methods. Perhaps the most interesting consequence of our results, however, is that it gives evidence for a connection between the formula size of a function and its complexity in a very different model, namely its quantum query complexity. In fact, our results can be taken as evidence for the provocative conjecture that the square of the quantum query complexity of a function lower bounds its formula size.