# Decision Tree Complexity

Decision trees are a simple computation model. One can compute a boolean function using a decision tree by examining the input bits in specific orders. The complexity is characterized by the number of input bits examined, or simply, the depth of the decision tree.

### Decision Trees

A decision tree for a boolean function $$f\colon\{0,1\}^n \to \{0,1\}$$ is a labeled tree where

• each internal node is labeled with a variable $$x_i$$, and the two outgoing edges are labeled with the two assignments $$x_i=0$$ and $$x_i=1$$,
• each leaf is labeled with an output 0 or 1.

Given an input, we examine its bits by following a path from the root of the tree to a leaf, which gives the output.

The depth of tree is the longest path from the root to a leaf. It is the maximum number of bits examined on an input. The decision tree complexity of a boolean function is the smallest depth of any decision tree computing the function.

Let $$dep(T)$$ be the depth of the tree $$T$$, and $$dep(T, x)$$ be the length of the path examining the bits of an input $$x$$. Then $$dep(T) = \max_{x} dep(T,x)$$. The decision tree complexity of $$f$$ is
$D(f) = \min_{T} dep(T) = \min_{T} \max_{x} dep(T,x),$
where $$T$$ ranges over all decisions tree computing $$f$$.

For example, the OR function on $$n$$ inputs requires a decision tree of depth $$n$$; otherwise, we can construct two inputs following the same path but may have different outputs, which is a contradiction.

As another example, the address function on $$n = k + 2^k$$ inputs is defined as follows: use the value of the first $$k$$ bits as an index, and output the bit within the rest $$2^k$$ bits at the indexed position. This function has a decision tree of depth $$k+1 \leq \log n +1$$.

### Randomized Decision Trees

A randomized decision tree is a distribution over a set of (deterministic) decision trees. We assume all decision trees in the set compute the function correctly on every input. The hope is that a randomized decision tree may examine fewer bits in expectation.

The randomized decision tree complexity for $$f$$ is defined as
$R(f) = \min_{\cal T} max_x {\bf E}_{T\in {\cal T}} dep(T, x),$
where $$\cal T$$ ranges over all possible randomized decision trees (all possible distributions over trees), and $$T$$ is a randomly chosen decision tree according to the distribution $$\cal T$$.

Another way of defining a randomized decision tree is to examine a randomly chosen input bit at each internal node of the tree.

For example, the majority function on 3 inputs can be computed by a decision tree of depth 3. By randomly permuting the inputs, we may construct 6 different decision trees. If we uniformly choose one of such decision trees, then for an input, say 001, we will examine $$2\cdot 2/6 + 3 \cdot 4/6 = 8/3$$ bits on average. Since this is the worst input, we have $$R(f) \leq 8/3$$.

Let $$\cal A$$ be a set of decision trees. Let $$\cal T$$ be a distribution over $$\cal A$$, that is a randomized decision tree. Let $$\cal D$$ be a distribution over the inputs $$\{0,1\}^n$$. Then Yao’s Min-Max Lemma says that
$\min_{\cal T} max_{x} {\bf E}_{T\sim {\cal T}} dep(T, x) = \max_{\cal D} min_{T\in {\cal A}} {\bf E}_{x\sim {\cal D}} dep(T, x) .$

On the LHS, $$\cal T$$ ranges over all distributions over decision trees. On the RHS, $$\cal D$$ ranges over all distributions on the inputs.

For the example of the majority function on 3 inputs, we already have the LHS is at most 8/3, and now we show the RHS is at least 8/3, which means $$R(f) = 8/3$$. We choose $${\cal D}$$ to be a uniform distribution over all inputs except 000 and 111. Then, for any decision tree, the average number of bits examined is 8/3, and this is true for any distribution over decision trees.