Publication IV discusses ensemble learning at
length, but this section describes briefly the cost function used in
ensemble learning. Let us denote the vector of all the unknown
variables of the model by
and the vector of observations by
**x** and
suppose that the probabilities
and
are defined. According to Bayes' rule, the
posterior probability
of the unknown variables is

(15) |

and the Kullback-Leibler information between the true posterior and its approximation is thus

The normalising constant is usually difficult to compute because it requires marginalising the joint density over . The cost function which is actually used is

The approximation which minimises (17) also minimises (16) because the term is constant with respect to the approximation .

In order for ensemble learning to be computationally efficient, the approximation should have a simple factorial form. Then the cost function splits into a sum of simple terms which can be computed efficiently.