Expectation maximisation (EM) algorithm can be seen as a special case of ensemble learning. The set-up in EM is the following: Suppose we have a probability model . We observe x but y remains hidden. We would like to estimate with maximum likelihood, i.e., maximise w.r.t. , but suppose the structure of the model is such that integration over is difficult, i.e., it is difficult to evaluate .
What we do is take the cost function and minimise it alternately with respect to and . The ordinary EM algorithm will result when has a free form in which case will be updated to be , where is the current estimate of . The method is useful if integration over is easy, which is often the case. This interpretation of EM was given by .
EM algorithm can suffer from overfitting because only point estimates for the parameters are used. Even worse is to use maximum a posterior (MAP) estimator where one finds the and y which maximise . Unlike maximum likelihood estimation, MAP estimation is not invariant under reparametrisations of the model. This is because MAP estimation is sensitive to probability density which changes nonuniformly if the parameter space is changed nonlinearly.
MAP estimation can be interpreted in ensemble learning framework as minimising and using delta-distribution as . This makes the integral infinite. It can be neglected when estimating and y because it is constant with respect to and , but the infinity of the cost function shows that delta distribution, i.e. a point estimator, is a bad approximation for a posterior density.