Pairwise Independence

A collection of random variables is pairwise independent if every pair of variables are independent.

Given \(k\) independent bits, we define \(n = 2^k-1\) random variables each of which is a parity of a non-empty subset of the \(k\) bits. These new random bits are pairwise independent. This gives a construction of \(n=2^k-1\) pairwise independent bits from just \(k=\log(n+1)\) bits. The sample space of the distribution of these \(n\) bits is only \(2^k = n+1\), while the domain size is \(2^n\).

Consider the MAXCUT problem: Given a graph, find a partition of the vertices such that the number of edges between the two partitions is maximized. A simple randomized algorithm finds a cut with expected size at least half of the number of edges. That is, assign each vertex a random bit independently, and the cut is between vertices assigned with 0 and those assigned with 1. For each edge, with probability 1/2, it is in the cut; in expectation, the cut has half of the edges.

This algorithm also works if we assign pairwise independent bits to the vertices; the probability that an edge is in the cut is still 1/2. Using the construction above, we can use \(k=\log(n+1)\) pure random bits to generate \(n\) pairwise independent bits. Enumerating all \(k\) bits takes \(O(n)\) time and \(O(\log n)\) space.

Pairwise Independent Hashing Functions

A family of functions \(H = \{h: N \to M\}\) is pairwise independent if, when we uniformly choose \(h\) at random from \(H\), each random variable \(h(x)\) is uniform, and each pair of variables \(h(x), h(y)\) for \(x\neq y\) are independent.

For any \(x \in N\), the variable \(h(x)\) is uniformly distributed in \(M\).
For any \(x\neq y\), the variables \(h(x)\) and \(h(y)\) are independent.

Equivalently, for any \(x\neq y\), and \(a, b\in M\),
\[
Pr[ h(x) = a, ~ h(y) = b ] = Pr[ h(x) = a] \cdot Pr[ h(y) = b ] = \frac{1}{|M|^2}
\]

By randomly choosing \(h \in H\), and enumerating over all \(x\in N\), we get a sequence of \(|N|\) pairwise independent variables, each of which is uniformly distributed in \(M\).

Let \(F\) be a finite field. Let \(H = \{ h_{a,b} \colon F\to F \}\) where \(h_{a,b}(x) = ax+b\) and \(a,b\in F\). Then this family of functions is pairwise independent.

Suppose \(|F| =n\). Then family has \(n^2\) functions; each function is specified with \(2\log n\) bits. A truly random function \(f\colon F\to F\) would require \(n\log n\) bits to describe. The sample space of the family has size \(n^2\), which the domain space has size \(n^n\).

If we take \(F = \{0,1\}^k\), then each function \(h\) maps \(k\) bits to \(k\) bits.

Consider a random function \(h\in H\) is used to map elements in a larger set \(N\) into a smaller set \(M\) (buckets). If \(H\) is pairwise independent, then the probability that two distinct elements from \(N\) are mapped to the same bucket is \(1/|M|^2\). This probability is the same as if we put a element randomly into the buckets. However, a pairwise independent family is much easier to describe; this is applied in hashing tables in practice.

Pairwise independence can be generalized to k-wise independence. A family of functions \(H = \{h: N \to M\}\) is k-wise independent if, when we uniformly choose \(h\) at random from \(H\), for any k distinct elements \(x\) from \(N\), the random variables \(h(x)\) are uniform and independent.

Let \(F\) be a finite field. Let \(H = \{ h_{a} \colon F\to F, ~ a=(a_1, \ldots, a_k) \}\) where \(h_{a}(x) = a_1 + a_2 x + \cdots + a_k x^{k-1}\). Then this family of functions is k-wise independent.

Comments

comments