Usually the models include unknown real values and therefore the posterior probability is expressed by a posterior pdf. Unfortunately the posterior pdf is typically a complex high dimensional function whose exact treatment is difficult. In practice, it has to be approximated in one way or another.

In ensemble learning, a parametric computationally tractable
approximation - an ensemble - is chosen for the posterior pdf. Let
*P* denote the exact posterior pdf and *Q* the ensemble. The misfit
between *P* and *Q* is measured with Kullback-Leibler information
between *Q* and *P*.

Ensemble learning was first used in [1] where it was applied to a multi-layer perceptron with one hidden layer. Since then it has been used e.g. in [3-9].