Model selection in ensemble learning

Next: Connection to coding Up: Ensemble learning Previous: Ensemble learning

Model selection in ensemble learning

$\begin{displaymath}p(H \vert x) \approx \frac{e^{-C_y(x \vert H)} P(H)} {\sum_{H'} e^{-C_y(x \vert H')} P(H')} \, . \end{displaymath}$

(7)

In fact, we can show that the above equation gives the best approximation for p(H | x) in terms of C_{y, H}(x), the Kullback-Leibler divergence between q(y, H | x) and p(y, H | x), which means that the model selection can be done using the same principle of approximating the posterior distribution as learning parameters.

Without losing any generality from q(y, H | x), we can write

$\begin{displaymath}q(y, H \vert x) = Q(H \vert x) q(y \vert x, H) \, . \end{displaymath}$

(8)

Now the cost function can be written as

C_{y, H}(x)	=	$\displaystyle \sum_H \int q(y, H \vert x) \ln \frac{q(y, H \vert x)} {p(x,y, H)} dy$
	=	$\displaystyle \sum_H Q(H \vert x) \int q(y \vert x, H) \ln \frac{Q(H \vert x) q(y \vert x, H)} {P(H) p(x, y \vert H)} dy$
	=	$\displaystyle \sum_H Q(H \vert x) \left[\ln \frac{Q(H \vert x)} {P(H)} + C_y(x \vert H) \right] \, .$	(9)

Minimising C_{y, H}(x) with respect to Q(H | x) under the constraint

$\begin{displaymath}\sum_H Q(H \vert x) = 1 \end{displaymath}$

(10)

yields

$\begin{displaymath}Q(H \vert x) = \frac{e^{-C_y(x \vert H)} P(H)} {\sum_{H'} e^{-C_y(x \vert H')} P(H')} \, . \end{displaymath}$

(11)

Substituting this into equation 9 yields the minimum value for C_{y, H}(x) which is

$\begin{displaymath}C_{y, H}(x) = -\ln \sum_{H} e^{-C_y(x \vert H)} P(H) \end{displaymath}$

(12)

If we wish to use only a part of different model structures H, we can try to find those H which would minimise C_{y, H}(x). It is easy to see that this is accomplished by choosing the models corresponding to C_y(x | H). A special case is to use only one Hcorresponding to the smallest C_y(x | H).

Next: Connection to coding Up: Ensemble learning Previous: Ensemble learning

Harri Lappalainen
2000-03-03