next up previous contents
Next: Ensemble Learning Up: Bayesian Probability Theory Previous: Marginalisation Principle

Approximations

In practice, exact treatment of the posterior probability density of the parameters is infeasible except for very simple models. Therefore, some suitable approximation method must be used. There are at least three options: 1) a point estimate; 2) sampling; 3) parametric approximation.

The result of inferring the parameter values is called the solution. The Bayesian solution is the whole posterior pdf. In point estimates the solution is a single point. The maximum likelihood (ML) solution for the parameters $\boldsymbol{\theta}$ is the point in which the likelihood $p(\boldsymbol{X}\mid \boldsymbol{\theta}, \mathcal{H})$ is highest. The maximum a posteriori (MAP) solution is the one with highest posterior pdf $p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})$. Point estimates are easiest to calculate but they fail in some situations as will be seen in Section [*].

It is possible [18] to construct a Markov chain that will draw points $\boldsymbol{\theta}_k$ from the posterior distribution $p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})$. Instead of integrating over the posterior distribution in ([*]), one can sum over the sequence $\boldsymbol{\theta}_k$. This sampling approach has originated from the Metropolis algorithm in the statistical physics and developed into e.g. Gibbs sampling and hybrid Monte Carlo method. Generally they are called Markov chain Monte Carlo methods. The length of the sequence for the approximation to be adequate can get too large to be used in practice when the problem is hard and has a large number of parameters.

Ensemble learning is a compromise between the Bayesian solution and the point estimates. It is used in this thesis and therefore the whole Section [*] is dedicated to it.


next up previous contents
Next: Ensemble Learning Up: Bayesian Probability Theory Previous: Marginalisation Principle
Tapani Raiko
2001-12-10