In practice, exact treatment of the posterior probability density of the parameters is infeasible except for very simple models. Therefore, some suitable approximation method must be used. There are at least three options: 1) a point estimate; 2) sampling; 3) parametric approximation.
The result of inferring the parameter values is called the solution.
The Bayesian solution is the whole posterior pdf. In point estimates
the solution is a single point. The maximum likelihood (ML) solution
for the parameters
is the point in which the likelihood
is highest. The maximum a posteriori (MAP) solution is
the one with highest posterior pdf
.
Point
estimates are easiest to calculate but they fail in some situations as
will be seen in Section
.
It is possible [18] to construct a Markov chain that will
draw points
from the posterior distribution
.
Instead of integrating over the posterior distribution in
(
), one can sum over the sequence
.
This
sampling approach has originated from the Metropolis algorithm in the
statistical physics and developed into e.g. Gibbs sampling and hybrid
Monte Carlo method. Generally they are called Markov chain Monte Carlo
methods. The length of the sequence for the approximation to be
adequate can get too large to be used in practice when the problem is
hard and has a large number of parameters.
Ensemble learning is a compromise between the Bayesian solution and
the point estimates. It is used in this thesis and therefore the whole
Section is dedicated to it.