k-wise Independence and Linear Code

A collection of random variables is k-wise independent if any k variables are mutually independent. Consider a distribution \(D\) on \(n\) bits. It is k-wise independent and uniform if for any \(S\subseteq [n]\) of size k, for any \(a\in\{0,1\}^k\),
Pr[ x_S = a] = 1 / 2^k.
Here we will only consider k-wise independent and uniform distributions over boolean domains.

A distribution \(f\) is \(k\)-wise independent if and only if all Fourier coefficients of degree at most k are 0, that is, for all \(\alpha\in \{0,1\}^n\) with \(|\alpha|\leq k\), we have \(\widehat{f}(\alpha) = 0\).

[Proof.] The Fourier coefficient \(\widehat{f}(\alpha) = \sum_{x} f(x) (-1)^{\alpha x}\). Given \(\alpha\), we consider the marginal distribution of \(f\) on all \(x_i\) such that \(\alpha_i =1\). We have \(\widehat{f}(\alpha)\) is the same as the Fourier coefficient of the marginal distribution on vector 1. For the reverse direction, we can argue inductively from 2 to k.

We wish to construct a k-wise independent distribution with a space as small as possible and also efficiently enumerable. Linear code provides a way to construct k-wise independent distribution.

Consider an \([n, l, d]\) linear code \(C\) with block length \(n\), dimension \(l\) and minimum distance \(d\). Let \(G\) be its generator matrix of dimension \(l\times n\); it transforms a message $late m$ of length \(l\) to a codeword \(w\) of length \(n\). That is, \(c = m \cdot G\). Let \(H\) be the parity-checking matrix of dimension \( n-l\times n\). For every codeword \(c\), it holds that \(c \cdot H’ = 0\).

The space of all codewords \(C\) is a linear subspace of \(F_2^n\), with dimension \(l\) and size \(2^l\). The rows of the generator matrix is a basis of the space. The orthogonal complement of \(C\), denoted by \(C^\bot\) contains all vectors \(x\) such that \(\langle x, c\rangle =0\) for all \(c\in C\). This forms an \([n, n-l]\) code, which is called the dual code. The following hold:

  • The dual code of \(C^\bot\) is \(C\).
  • The generator matrix of $late C$ is the parity-check matrix of \(C^\bot\).

We will argue that, a uniform distribution over the dual code \(C^\bot\) is \((d-1)\)-wise independent distribution. In other words, if we uniformly choose a message \(m\) of length \(n-l\), then the codeword \(m \cdot H\) has a \((d-1)\)-wise independent distribution.

The code \(C\) has minimum distance \(d\) means that the minimum Hamming weight of its codeword is at least \(d\), and any \(d-1\) columns of the parity-check matrix are linearly independent.

Let \(f\) be a uniform distribution over \(C^\bot\). Let \(g\) be a uniform distribution over \(\{0,1\}^{n-l}\). Then
\widehat{f}(\alpha) = \sum_{x} f(x)\cdot (-1)^{x \alpha’} = \sum_{u} g(u) (-1)^{u H \alpha’}
Here \(u\) is over the space \(\{0,1\}^{n-l}\). For \(\alpha\) such that \(\)|\alpha| < d[/latex], \[ 2^{n-l}\cdot \widehat{f}(\alpha) = \sum_{u} (-1)^{uH \alpha'} = \sum_{u} (-1)^{u\beta} = 0 \] Here [latex]H \alpha'[/latex] is a non-zero vector since any [latex]d-1[/latex] columns of [latex]H[/latex] are linearly independent. For any non-zero vector [latex]\beta[/latex], the final summation is 0. Therefore, using an [latex][n, l, d][/latex] code, we can construct a [latex](d-1)[/latex]-wise independent distribution with space size [latex]2^{n-l}[/latex].