Usually the models include unknown real values and therefore the posterior probability is expressed by a posterior pdf. Unfortunately the posterior pdf is typically a complex high dimensional function whose exact treatment is difficult. In practice, it has to be approximated in one way or another.
In ensemble learning, a parametric computationally tractable approximation - an ensemble - is chosen for the posterior pdf. Let P denote the exact posterior pdf and Q the ensemble. The misfit between P and Q is measured with Kullback-Leibler information between Q and P.
The parameters of the ensemble are optimised to fit the posterior by minimising .
Ensemble learning was first used in  where it was applied to a multi-layer perceptron with one hidden layer. Since then it has been used e.g. in [3-9].