** Next:** Expectation-maximisation algorithm
** Up:** Approximations
** Previous:** Point estimates
** Contents**

##

The Laplace approximation

Compared to the point estimates, a more accurate way to approximate the integrals in Equations
(2.2), (2.3), and (2.4) is to
use the Laplace approximation (see MacKay, 2003; Bishop, 1995). The basic
idea is to find the maximum of the function to be integrated and apply
a second order Taylor series approximation for the logarithm of that
function. In case of computing an expectation over the posterior
distribution, the maximum is the MAP solution and the second order
Taylor series corresponds to a Gaussian distribution for which
integrals can be computed analytically.
The Laplace approximation can be used to select the best solution in
case several local maxima have been found since a broad peak is
preferred over a high but narrow peak. Unfortunately the Laplace
approximation does not help in situations where a good representative
of the probability mass is not a local maximum, like in Figure 2.3.

Laplace approximation can be used to compare different model
structures successfully.
It can be further simplified by
retaining only the terms that grow with the number of data samples.
This is known as the Bayesian information criterion (BIC) by
Schwarz (1978).
Publication IX uses BIC in structural learning of logical hidden Markov models.

** Next:** Expectation-maximisation algorithm
** Up:** Approximations
** Previous:** Point estimates
** Contents**
Tapani Raiko
2006-11-21