Next: Linear computational complexity Up: Building Blocks for Hierarchical Previous: Introduction

Ensemble Learning

This section gives a brief overview of ensemble learning with emphasis on solutions yielding linear computational complexity. Thorough introductions to ensemble learning can be found for instance in [13,14].

Ensemble learning is a method for approximating posterior probability distributions. It enables to choose a posterior approximation ranging from point estimates to exact posterior. The misfit of the approximation is measured by the Kullback-Leibler divergence between the posterior and its approximation. Let us denote the observed variables by $\mathbf{X}$ , the latent variables (parameters) of the model by $\boldsymbol{\theta}$ and the approximation of the true posterior $p(\boldsymbol{\theta}\mid \mathbf{X})$ by $q(\boldsymbol{\theta})$ . The cost function used in ensemble learning is

$\displaystyle C = \left< \ln \frac{q(\boldsymbol{\theta})}{p(\mathbf{X}, \bolds... ...heta})}{p(\boldsymbol{\theta}\mid \mathbf{X})} \right> - \ln p(\mathbf{X}) \, ,$

(1)

where the operator $\left< \cdot \right>$ denotes an expectation over the distribution $q(\boldsymbol{\theta})$ . Note that for practical reasons the cost function equals the Kullback-Leibler divergence only up to a constant $-\ln p(\mathbf{X})$ . This means that the cost function can be turned into a lower bound of the model evidence $p(\mathbf{X})$ which can then be used for learning the model structure.

Subsections

Next: Linear computational complexity Up: Building Blocks for Hierarchical Previous: Introduction

Harri Valpola 2001-10-01