# Decision Tree Complexity

Decision trees are a simple computation model. One can compute a boolean function using a decision tree by examining the input bits in specific orders. The complexity is characterized by the number of input bits examined, or simply, the depth of the decision tree.

### Decision Trees

A decision tree for a boolean function \(f\colon\{0,1\}^n \to \{0,1\}\) is a labeled tree where

- each internal node is labeled with a variable \(x_i\), and the two outgoing edges are labeled with the two assignments \(x_i=0\) and \(x_i=1\),
- each leaf is labeled with an output 0 or 1.

Given an input, we examine its bits by following a path from the root of the tree to a leaf, which gives the output.

The depth of tree is the longest path from the root to a leaf. It is the maximum number of bits examined on an input. The decision tree complexity of a boolean function is the smallest depth of any decision tree computing the function.

Let \(dep(T)\) be the depth of the tree \(T\), and \(dep(T, x)\) be the length of the path examining the bits of an input \(x\). Then \(dep(T) = \max_{x} dep(T,x)\). The decision tree complexity of \(f\) is

\[

D(f) = \min_{T} dep(T) = \min_{T} \max_{x} dep(T,x),

\]

where \(T\) ranges over all decisions tree computing \(f\).

For example, the OR function on \(n\) inputs requires a decision tree of depth \(n\); otherwise, we can construct two inputs following the same path but may have different outputs, which is a contradiction.

As another example, the address function on \(n = k + 2^k\) inputs is defined as follows: use the value of the first \(k\) bits as an index, and output the bit within the rest \(2^k\) bits at the indexed position. This function has a decision tree of depth \(k+1 \leq \log n +1\).

### Randomized Decision Trees

A randomized decision tree is a distribution over a set of (deterministic) decision trees. We assume all decision trees in the set compute the function correctly on every input. The hope is that a randomized decision tree may examine fewer bits in expectation.

The randomized decision tree complexity for \(f\) is defined as

\[

R(f) = \min_{\cal T} max_x {\bf E}_{T\in {\cal T}} dep(T, x),

\]

where \(\cal T\) ranges over all possible randomized decision trees (all possible distributions over trees), and \(T\) is a randomly chosen decision tree according to the distribution \(\cal T\).

Another way of defining a randomized decision tree is to examine a randomly chosen input bit at each internal node of the tree.

For example, the majority function on 3 inputs can be computed by a decision tree of depth 3. By randomly permuting the inputs, we may construct 6 different decision trees. If we uniformly choose one of such decision trees, then for an input, say 001, we will examine \(2\cdot 2/6 + 3 \cdot 4/6 = 8/3\) bits on average. Since this is the worst input, we have \(R(f) \leq 8/3\).

Let \(\cal A\) be a set of decision trees. Let \(\cal T\) be a distribution over \(\cal A\), that is a randomized decision tree. Let \(\cal D\) be a distribution over the inputs \(\{0,1\}^n\). Then Yao’s Min-Max Lemma says that

\[

\min_{\cal T} max_{x} {\bf E}_{T\sim {\cal T}} dep(T, x) =

\max_{\cal D} min_{T\in {\cal A}} {\bf E}_{x\sim {\cal D}} dep(T, x) .

\]

On the LHS, \(\cal T\) ranges over all distributions over decision trees. On the RHS, \(\cal D\) ranges over all distributions on the inputs.

For the example of the majority function on 3 inputs, we already have the LHS is at most 8/3, and now we show the RHS is at least 8/3, which means \(R(f) = 8/3\). We choose \({\cal D}\) to be a uniform distribution over all inputs except 000 and 111. Then, for any decision tree, the average number of bits examined is 8/3, and this is true for any distribution over decision trees.