Usually the models include unknown real values and therefore the posterior probability is expressed by a posterior pdf. Unfortunately the posterior pdf is typically a complex high dimensional function whose exact treatment is difficult. In practice, it has to be approximated in one way or another.
In ensemble learning, a parametric computationally tractable
approximation - an ensemble - is chosen for the posterior pdf. Let
P denote the exact posterior pdf and Q the ensemble. The misfit
between P and Q is measured with Kullback-Leibler information
between Q and P.
Ensemble learning was first used in [1] where it was applied to a multi-layer perceptron with one hidden layer. Since then it has been used e.g. in [3-9].