Azuma’s Inequality

The Chernoff bound gives the concentration of sums of independent random variables. The concentration result can be generalized to martingales.

Martingales and Azuma’s Inequality

A sequence of random variables \(X_0, X_1, \ldots, X_n\) is called a martingale if
\[
{\bf E}[X_i\mid X_{i-1}, \ldots, X_{0}] = X_{i-1}.
\]
Let \(Y_i = X_i – X_{i-1}\); then \({\bf E}[Y_i\mid X_{i-1},\ldots, X_{0}] = 0\).

Theorem (Azuma’s Inequality) Let \(X_0, X_1,\ldots, X_n\) be a martingale, and \(Y_i = X_i – X_{i-1}\). If there exists constant \(c_i > 0\) such that \(|Y_i| \leq c_i\), then for any \(\lambda > 0\),
\[
{\bf Pr}[X_n – X_0 \geq \lambda] \leq \exp\left( -\frac{\lambda^2}{2\sum_{i=1}^n c_i^2 } \right),
\]
and
\[
{\bf Pr}[X_n – X_0 \leq -\lambda] \leq \exp\left( -\frac{\lambda^2}{2\sum_{i=1}^n c_i^2 } \right).
\]

First, we need the following lemma.

Lemma Let \(Y\) be a random variable such that \({\bf E}[Y] = 0\) and \(|Y| \leq c\) for some constant \(c > 0\). Then for any \(t > 0\), it holds that \({\bf E}[e^{tY}] \leq e^{t^2c^2/2}\).

The proof follows from the convexity of \(f(x) = e^{tx}\). For any \(|x| \leq c\), we have
\[
e^{tx} \leq \frac{1}{2c} (e^{tc} – e^{-tc}) x + \frac{1}{2}(e^{tc} + e^{-tc}).
\]
Taking expectations on both sides, we have
\[
{\bf E}[e^{tY}] \leq \frac{1}{2}(e^{tc} + e^{-tc}) = e^{t^2c^2/2}.
\]
Here using Taylor series \(e^{x} = \sum^{\infty}_{n=0} \frac{x^n}{n!}\), we have
\[
\frac{1}{2}(e^x + e^{-x}) = \sum_{n=0}^{\infty} \frac{x^{2n}}{(2n)!}
\leq \sum_{n=0}^{\infty} \frac{x^{2n}}{2^n n!} = e^{x^2/2}
\]

Now we are ready to prove Azuma’s inequality. First,
\[
{\bf E}[e^{tY_i} \mid X_{i-1},\ldots, X_1] \leq e^{t^2c_i^2/2}.
\]
This gives a constant bound independent of the conditioning. By backward induction,
\[
{\bf E}[e^{tX_i}] ={\bf E}[e^{t(Y_i + X_{i-1})}] =
{\bf E}[e^{t X_{i-1}} \cdot {\bf E}[e^{t Y_i}\mid X_{i-1},\ldots, X_1]] \\
\leq {\bf E}[e^{t X_{i-1}} ] \cdot e^{t^2c_i^2/2}=
e^{t^2 \sum_{j=1}^{i}c_j^2/2} {\bf E}[e^{tX_0}].
\]
Therefore, by Markov’s inequality, we have
\[
{\bf Pr} [ X_n – X_0 \geq \lambda ]
= {\bf Pr} [ e^{t(X_n – X_0)} \geq e^{\lambda t} ] \\
\leq e^{-\lambda t} {\bf E}[e^{t(X_n – X_0)}]
\leq e^{-\lambda t + t^2 \sum_{j=1}^{n}c_j^2/2}
\]
The above holds for any \(t > 0\). Taking \(t = \lambda / \sum_{j=1}^{n}c_j^2\), we complete the proof.

Application: Coin Tossing

Let \(Z_1,\ldots, Z_n\) be a sequence of independent variables taking two values -1 and 1 with equal probability (random coin tossing). Let \(X_0=0\) and \(X_i = \sum_{j=1}^i Z_i\) for \(i= 1, \ldots, n\). Then \(X_0,\ldots, X_n\) is a martingale.

By Azuma’s inequality, for any \(\lambda > 0\)
\[
{\bf Pr}[X_n \geq \lambda] \leq \exp(-\frac{\lambda^2}{2n}).
\]

Note that, in this case, \(X_i\) is a sum of independent variables; a similar bound can be derived using the Chernoff bound.

Comments

comments