Pairwise Independence
A collection of random variables is pairwise independent if every pair of variables are independent.
Given \(k\) independent bits, we define \(n = 2^k-1\) random variables each of which is a parity of a non-empty subset of the \(k\) bits. These new random bits are pairwise independent. This gives a construction of \(n=2^k-1\) pairwise independent bits from just \(k=\log(n+1)\) bits. The sample space of the distribution of these \(n\) bits is only \(2^k = n+1\), while the domain size is \(2^n\).
Consider the MAXCUT problem: Given a graph, find a partition of the vertices such that the number of edges between the two partitions is maximized. A simple randomized algorithm finds a cut with expected size at least half of the number of edges. That is, assign each vertex a random bit independently, and the cut is between vertices assigned with 0 and those assigned with 1. For each edge, with probability 1/2, it is in the cut; in expectation, the cut has half of the edges.
This algorithm also works if we assign pairwise independent bits to the vertices; the probability that an edge is in the cut is still 1/2. Using the construction above, we can use \(k=\log(n+1)\) pure random bits to generate \(n\) pairwise independent bits. Enumerating all \(k\) bits takes \(O(n)\) time and \(O(\log n)\) space.
Pairwise Independent Hashing Functions
A family of functions \(H = \{h: N \to M\}\) is pairwise independent if, when we uniformly choose \(h\) at random from \(H\), each random variable \(h(x)\) is uniform, and each pair of variables \(h(x), h(y)\) for \(x\neq y\) are independent.
- For any \(x \in N\), the variable \(h(x)\) is uniformly distributed in \(M\).
- For any \(x\neq y\), the variables \(h(x)\) and \(h(y)\) are independent.
Equivalently, for any \(x\neq y\), and \(a, b\in M\),
\[
Pr[ h(x) = a, ~ h(y) = b ] = Pr[ h(x) = a] \cdot Pr[ h(y) = b ] = \frac{1}{|M|^2}
\]
By randomly choosing \(h \in H\), and enumerating over all \(x\in N\), we get a sequence of \(|N|\) pairwise independent variables, each of which is uniformly distributed in \(M\).
Let \(F\) be a finite field. Let \(H = \{ h_{a,b} \colon F\to F \}\) where \(h_{a,b}(x) = ax+b\) and \(a,b\in F\). Then this family of functions is pairwise independent.
Suppose \(|F| =n\). Then family has \(n^2\) functions; each function is specified with \(2\log n\) bits. A truly random function \(f\colon F\to F\) would require \(n\log n\) bits to describe. The sample space of the family has size \(n^2\), which the domain space has size \(n^n\).
If we take \(F = \{0,1\}^k\), then each function \(h\) maps \(k\) bits to \(k\) bits.
Consider a random function \(h\in H\) is used to map elements in a larger set \(N\) into a smaller set \(M\) (buckets). If \(H\) is pairwise independent, then the probability that two distinct elements from \(N\) are mapped to the same bucket is \(1/|M|^2\). This probability is the same as if we put a element randomly into the buckets. However, a pairwise independent family is much easier to describe; this is applied in hashing tables in practice.
Pairwise independence can be generalized to k-wise independence. A family of functions \(H = \{h: N \to M\}\) is k-wise independent if, when we uniformly choose \(h\) at random from \(H\), for any k distinct elements \(x\) from \(N\), the random variables \(h(x)\) are uniform and independent.
Let \(F\) be a finite field. Let \(H = \{ h_{a} \colon F\to F, ~ a=(a_1, \ldots, a_k) \}\) where \(h_{a}(x) = a_1 + a_2 x + \cdots + a_k x^{k-1}\). Then this family of functions is k-wise independent.