next up previous
Next: Linear computational complexity Up: Building Blocks for Hierarchical Previous: Introduction


Ensemble Learning

This section gives a brief overview of ensemble learning with emphasis on solutions yielding linear computational complexity. Thorough introductions to ensemble learning can be found for instance in [13,14].

Ensemble learning is a method for approximating posterior probability distributions. It enables to choose a posterior approximation ranging from point estimates to exact posterior. The misfit of the approximation is measured by the Kullback-Leibler divergence between the posterior and its approximation. Let us denote the observed variables by $ \mathbf{X}$, the latent variables (parameters) of the model by $ \boldsymbol{\theta}$ and the approximation of the true posterior $ p(\boldsymbol{\theta}\mid \mathbf{X})$ by $ q(\boldsymbol{\theta})$. The cost function $ C$ used in ensemble learning is

$\displaystyle C = \left< \ln \frac{q(\boldsymbol{\theta})}{p(\mathbf{X}, \bolds...
...heta})}{p(\boldsymbol{\theta}\mid \mathbf{X})} \right> - \ln p(\mathbf{X}) \, ,$ (1)

where the operator $ \left< \cdot \right>$ denotes an expectation over the distribution $ q(\boldsymbol{\theta})$. Note that for practical reasons the cost function equals the Kullback-Leibler divergence only up to a constant $ -\ln p(\mathbf{X})$. This means that the cost function can be turned into a lower bound of the model evidence $ p(\mathbf{X})$ which can then be used for learning the model structure.



Subsections
next up previous
Next: Linear computational complexity Up: Building Blocks for Hierarchical Previous: Introduction
Harri Valpola 2001-10-01