next up previous contents
Next: Generalisation Up: Ensemble Learning Previous: Connection to coding

   
Model Selection

Ensemble learning offers another important benefit. Comparison of different models is straightforward. The Bayes rule can be applied again to get the probability of a model given the data

 \begin{displaymath}p(\mathcal{H}\mid \boldsymbol{X}) = \frac{p(\boldsymbol{X}\mid \mathcal{H})p(\mathcal{H})}{p(\boldsymbol{X})},
\end{displaymath} (3.13)

where $p(\mathcal{H})$ is the prior probability of a model and p(X) is the probability of the data, which is constant. A lower bound on the evidence term $p(\boldsymbol{X}\mid
\mathcal{H})$ is obtained from ([*])

 \begin{displaymath}p(\mathbf{X}\mid \mathcal{H}) = \exp(C_{\mathrm{KL}}- C) \geq \exp (-C)
\end{displaymath} (3.14)

Multiple models can be used as a mixture of experts model [24]. The experts can be weighted with their probabilities given in equation ([*]). If the models have equal prior probabilities and the parameter approximations CKL are equally good, the weights simplify to $\exp (-C)$. In practice, the costs tend to differ in the order of hundreds or thousands, which makes the model with the lowest cost C dominant. Therefore it is reasonable to concentrate on model selection rather than weighting.



Tapani Raiko
2001-12-10