Expectation maximisation (EM) algorithm can be seen as a special case
of ensemble learning. The set-up in EM is the following:
Suppose we have a probability model
.
We observe
x but y remains hidden. We would like to estimate
with
maximum likelihood, i.e., maximise
w.r.t.
,
but suppose the structure of the model is such that integration over
is difficult, i.e., it is difficult to evaluate
.
What we do is take the cost function
and minimise
it alternately with respect to
and
.
The
ordinary EM algorithm will result when
has a free
form in which case
will be updated to be
,
where
is the current estimate of
.
The method is useful if integration over
is easy, which is often the case. This interpretation of EM
was given by [4].
EM algorithm can suffer from overfitting because only point estimates
for the parameters
are used. Even worse is to use maximum a
posterior (MAP) estimator where one finds the
and y which
maximise
.
Unlike maximum likelihood estimation,
MAP estimation is not invariant under reparametrisations of the model.
This is because MAP estimation is sensitive to probability density
which changes nonuniformly if the parameter space is changed
nonlinearly.
MAP estimation can be interpreted in ensemble learning framework as
minimising
and using delta-distribution as
.
This makes the integral
infinite. It can be neglected when estimating
and y because it is constant with respect to
and
,
but the infinity of the cost function shows that delta
distribution, i.e. a point estimator, is a bad approximation for a
posterior density.