Model selection

Next: Optimisation and local minima Up: Variational Bayesian methods Previous: Cost function Contents

Model selection

VB learning offers another important benefit. Comparison of different models is straightforward. The Bayes rule can be applied again to get the probability of a model given the data

$\displaystyle p(\mathcal{H}_i \mid \boldsymbol{X}) = \frac{p(\boldsymbol{X}\mid \mathcal{H}_i)p(\mathcal{H}_i)}{p(\boldsymbol{X})},$

(4.4)

where $p(\mathcal{H}_i)$ is the prior probability of the model $\mathcal{H}_i$ and $p(\boldsymbol{X})$ is a constant that can be ignored. A lower bound on the evidence term $p(\boldsymbol{X}\mid \mathcal{H}_i)$ is obtained from Equation (4.3) and it is

$\displaystyle p(\boldsymbol{X}\mid \mathcal{H}_i) = \exp(\mathcal{C}_{KL} - \mathcal{C}) \geq \exp (-\mathcal{C}).$

(4.5)

Multiple models can be used as a mixture-of-experts model (Haykin, 1999). The experts can be weighted with their probabilities $p(\mathcal{H}_i \mid \boldsymbol{X})$ given in equation (4.4). Lappalainen and Miskin (2000) show that the optimal weights in the sense of variational Bayesian approximation are in fact $p(\mathcal{H}_i)\exp (-\mathcal{C})$ . If the models have equal prior probabilities $p(\mathcal{H}_i)$ , the weights simplify further to $\exp (-\mathcal{C})$ . In practice, the costs $\mathcal{C}$ tend to differ in the order of hundreds or thousands, which makes the model with the lowest cost $\mathcal{C}$ dominant. Therefore it is reasonable to concentrate on model selection rather than weighting.

Next: Optimisation and local minima Up: Variational Bayesian methods Previous: Cost function Contents

Tapani Raiko 2006-11-21