next up previous
Next: Connection to coding Up: Ensemble learning Previous: Ensemble learning

Model selection in ensemble learning

Recall that the cost function Cy(x | H) can be translated into lower bound for p(x | H). Since p(H | x) = p(x | H) p(H) / p(x), it is natural that Cy(x | H) can be used for model selection also by equating

\begin{displaymath}p(H \vert x) \approx \frac{e^{-C_y(x \vert H)} P(H)} {\sum_{H'} e^{-C_y(x \vert
H')} P(H')} \, .
\end{displaymath} (7)

In fact, we can show that the above equation gives the best approximation for p(H | x) in terms of Cy, H(x), the Kullback-Leibler divergence between q(y, H | x) and p(y, H | x), which means that the model selection can be done using the same principle of approximating the posterior distribution as learning parameters.

Without losing any generality from q(y, H | x), we can write

\begin{displaymath}q(y, H \vert x) = Q(H \vert x) q(y \vert x, H) \, .
\end{displaymath} (8)

Now the cost function can be written as
 
Cy, H(x) = $\displaystyle \sum_H \int q(y, H \vert x) \ln \frac{q(y, H \vert x)} {p(x,y, H)} dy$  
  = $\displaystyle \sum_H Q(H \vert x) \int q(y \vert x, H) \ln \frac{Q(H \vert x) q(y \vert x, H)} {P(H) p(x, y \vert H)} dy$  
  = $\displaystyle \sum_H Q(H \vert x) \left[\ln \frac{Q(H \vert x)} {P(H)} + C_y(x \vert H) \right] \, .$ (9)

Minimising Cy, H(x) with respect to Q(H | x) under the constraint

\begin{displaymath}\sum_H Q(H \vert x) = 1
\end{displaymath} (10)

yields

\begin{displaymath}Q(H \vert x) = \frac{e^{-C_y(x \vert H)} P(H)} {\sum_{H'} e^{-C_y(x \vert H')}
P(H')} \, .
\end{displaymath} (11)

Substituting this into equation 9 yields the minimum value for Cy, H(x) which is

\begin{displaymath}C_{y, H}(x) = -\ln \sum_{H} e^{-C_y(x \vert H)} P(H)
\end{displaymath} (12)

If we wish to use only a part of different model structures H, we can try to find those H which would minimise Cy, H(x). It is easy to see that this is accomplished by choosing the models corresponding to Cy(x | H). A special case is to use only one Hcorresponding to the smallest Cy(x | H).


next up previous
Next: Connection to coding Up: Ensemble learning Previous: Ensemble learning
Harri Lappalainen
2000-03-03