VB learning offers another important benefit. Comparison of different models is straightforward. The Bayes rule can be applied again to get the probability of a model given the data
Multiple models can be used as a mixture-of-experts model
(Haykin, 1999). The experts can be weighted with their probabilities
given in equation
(4.4). Lappalainen and Miskin (2000) show that the
optimal weights in the sense of variational Bayesian approximation are
in fact
. If the models have equal prior
probabilities
, the weights simplify further to
. In practice, the costs
tend to differ in the order of
hundreds or thousands, which makes the model with the lowest cost
dominant. Therefore it is reasonable to concentrate on model
selection rather than weighting.