Parametric approximations

Next: Information-theoretic approaches to learning Up: Methods for approximating the Previous: Stochastic sampling

Parametric approximations

Parametric approximations lie in between point estimates and stochastic sampling in terms of computational complexity and accuracy of the approximation. The key idea is to replace the complex posterior probability by a simpler, mathematically tractable approximation.

A standard procedure for approximating the posterior density with a parametric posterior density is Laplace's method [71], where the logarithm of the posterior density is approximated by its Taylor series expansion around the maximum point, i.e., the MAP estimate. The most used is the second order expansion, which amounts to approximating the posterior density by the Gaussian distribution. The choice is done primarily for mathematical tractability, although it has been shown that under very general conditions the posterior density will approach the Gaussian distribution as the number of measurements grows. For a textbook account on Laplace's method, asymptotic normality of the posterior density and statistics in general, see e.g. [115].

Laplace's method, when applied to complex models, can suffer from the same problems as MAP estimation in general. If the MAP estimate fails to locate a point in parameter space which not only has high probability density but also is surrounded by large probability mass, the second order Taylor series expansion can recognise this, but cannot, in practice, guide the search for a better point estimate to start with because it would be computationally too expensive. Whether this is a problem depends on the models at hand. In supervised learning with neural networks, MacKay has obtained good results [81], but for unsupervised learning of complex models, the MAP estimate causes problems.

Next: Information-theoretic approaches to learning Up: Methods for approximating the Previous: Stochastic sampling

Harri Valpola
2000-10-31