Next: Factorial Approximation Up: Bayesian Inference and Ensemble Previous: Approximations

Ensemble Learning

Ensemble learning [25,3,44,52] is one type of variational learning. The basic idea in ensemble learning is to minimise the misfit between the exact posterior pdf $p(\boldsymbol{\theta}\mid\boldsymbol{X})$ and its parametric approximation $q(\boldsymbol{\theta})$ . The misfit is measured with the Kullback-Leibler (KL) divergence

$\displaystyle C_{\mathrm{KL}}= D(q(\boldsymbol{\theta}) \parallel p(\boldsymbol... ...ymbol{\theta})}{p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})} \right>$			(3.7)
$\displaystyle = \int q(\boldsymbol{\theta}) \ln \frac{q(\boldsymbol{\theta})}{p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})} d\boldsymbol{\theta}\geq 0,$

where the operator $\left< \cdot \right>$ denotes an expectation over the distribution $q(\boldsymbol{\theta})$ . KL divergence is hard to evaluate and therefore the cost function C that is actually used is

$\begin{displaymath}C = \left< \ln \frac{q(\boldsymbol{\theta})}{p(\boldsymbol{X}... ...\right> = C_{\mathrm{KL}}- \ln p(\mathbf{X}\mid \mathcal{H}). \end{displaymath}$

(3.8)

The terms C and C_KL differ by the term $\ln p(\mathbf{X}\mid \mathcal{H})$ , the logarithm of the evidence. While optimising the distribution $q(\boldsymbol{\theta})$ for a single model $\mathcal{H}$ , the evidence is constant and can be ignored.

The cost function can be divided to a sum C=C_q+C_p, where

C_q	=	$\displaystyle \left< \ln q(\boldsymbol{\theta}) \right>$	(3.9)
C_p	=	$\displaystyle -\left< \ln p(\boldsymbol{X},\boldsymbol{\theta}\mid \mathcal{H}) \right>.$	(3.10)

Density estimates of continuous valued latent variables offer a great advantage over point estimates in being robust against over-fitting and providing a cost function suitable for learning model structures. With ensemble learning the density estimates can be almost as efficient as point estimates. Roberts [58] compared Laplace approximation, sample based and ensemble learning with ICA problem on music data. The sources were well recovered using ensemble learning and the approach was considerably faster than the other methods.

Next: Factorial Approximation Up: Bayesian Inference and Ensemble Previous: Approximations

Tapani Raiko
2001-12-10