Kolmogorov Complexity

Kolmogorov complexity measures the complexity of describing an object, such as a binary string. The description is considered as a program which can be run on a computer. By Church’s thesis, all non-trivial computational models are equivalent; thus, one can consider any universal computer (which can simulate other computers).

Let \(x\) be a binary string and \(U\) be a universal computer. Denote by \(l(x)\) the length of \(x\). The Kolmogorov complexity of \(x\) with respect to \(U\) is defined as
K(x; U) = \min_{p\colon U(p) = x} l(p).
This is the length of the shortest program which prints \(x\) and halts.

The Kolmogorov complexity of a string with respect to two universal computers differs only by a constant which is independent of the string. This is by the universality of computation; that is, a universal computer can interpret instructions of another computer by a constant-size simulation program. Thus, we will simply write \(K(x)\) for the Kolmogorov complexity of \(x\).

We define the conditional Kolmogorov complexity of \(x\) given the length of \(x\) as \(K(x\mid l(x))\). The following are simple facts.

  • \(K(x\mid l(x)) \leq l(x) + O(1)\).
  • \(K(x) \leq K(x\mid l(x)) + 2 \log l(x) + O(1)\).

The second statement follows from the first by encoding the length into the description.

Consider a binary string of length \(n\) with \(k\) ones. One can compress the string into almost \(n H(k/n)\) bits, where \(H(p) = -p \log p -(1-p)\log(1-p)\). The final description could be the compressed string plus a constant-size decompression program. The total length is upper bounded by \(n H(k/n) + O(\log n)\). Another way of description is using the following program: generate strings with \(k\) ones lexicographically, and output the i-th string.

The total number of strings with complexity \(K(x) < k\) is less than \(2^k\). Following this, the fraction of binary strings of length \(n\) with Kolmogorov complexity less than \(n-k\) is less than \(2^{-k}\).

For any computer \(U\), \(\sum_{p\colon U(p)~halts} 2^{-l(p)} \leq 1\), by Kraft inequality.

Kolmogorov complexity and entropy

Consider a sequence of i.i.d. binary variables \(X^n =\{X_i\}\). Let \(p(x^n) = \prod_{i=1}^n p(x_i)\) be the probability distribution. Consider \(E[ K(X^n \mid n) ] = \sum_{x^n} K(x^n \mid n)\). The following holds.
n H(X) \leq E[ K(X^n \mid n) ] \leq n H(X) + O(\log n).

The lower bound is from the theory of source coding, where the expected codeword length is greater than the entropy. The upper bound is by the compressed encoding of each string, and also by Jensen’s inequality that
E[ H(\sum_{i=1}^n X_i/n) ] \leq H(E [\sum_{i=1}^n X_i/n] ) = H (E[X]).

The above result holds also with high probability.
K(X^n \mid n) / n \to H(X) \hbox{ in probability }.

A infinite string \(x\) is incompressible if
\lim_{n\to \infty} \frac{K (x^n\mid n)}{ n } = 1.
If a string is incompressible, then \(\sum x_i /n \to 1/2\). The upper bound of Kolmogrov complexity gives a lower bound for then entropy \(H(k/n)\).

Kolmogorov complexity is uncomputable; this follows from that the halting problem is uncomputable.


Kolmogorov complexity is different from computational complexity in many senses. In general, a program computes a function from inputs to outputs, where the input ranges from strings of arbitrary length, and the function is an “infinite” object. In computational complexity, we wish to find the most efficient (which must be finite to be interesting) program computing the function/problem. In Kolmogorov complexity, one is given a string, and wishes to find the shortest program generating the string. The program does not take any input; it will only produce an output.



Tagged with: ,
Posted in Probability, Theory