next up previous contents
Next: Optimisation and local minima Up: Variational Bayesian methods Previous: Cost function   Contents

Model selection

VB learning offers another important benefit. Comparison of different models is straightforward. The Bayes rule can be applied again to get the probability of a model given the data

$\displaystyle p(\mathcal{H}_i \mid \boldsymbol{X}) = \frac{p(\boldsymbol{X}\mid \mathcal{H}_i)p(\mathcal{H}_i)}{p(\boldsymbol{X})},$ (4.4)

where $ p(\mathcal{H}_i)$ is the prior probability of the model $ \mathcal{H}_i$ and $ p(\boldsymbol{X})$ is a constant that can be ignored. A lower bound on the evidence term $ p(\boldsymbol{X}\mid \mathcal{H}_i)$ is obtained from Equation (4.3) and it is

$\displaystyle p(\boldsymbol{X}\mid \mathcal{H}_i) = \exp(\mathcal{C}_{KL} - \mathcal{C}) \geq \exp (-\mathcal{C}).$ (4.5)

Multiple models can be used as a mixture-of-experts model (Haykin, 1999). The experts can be weighted with their probabilities $ p(\mathcal{H}_i \mid \boldsymbol{X})$ given in equation (4.4). Lappalainen and Miskin (2000) show that the optimal weights in the sense of variational Bayesian approximation are in fact $ p(\mathcal{H}_i)\exp (-\mathcal{C})$. If the models have equal prior probabilities $ p(\mathcal{H}_i)$, the weights simplify further to $ \exp
(-\mathcal{C})$. In practice, the costs $ \mathcal{C}$ tend to differ in the order of hundreds or thousands, which makes the model with the lowest cost $ \mathcal{C}$ dominant. Therefore it is reasonable to concentrate on model selection rather than weighting.


next up previous contents
Next: Optimisation and local minima Up: Variational Bayesian methods Previous: Cost function   Contents
Tapani Raiko 2006-11-21