Factorial Approximation

Next: Hierarchical Models Up: Ensemble Learning Previous: Ensemble Learning

Factorial Approximation

One can choose a posterior approximation $q(\boldsymbol{\theta})$ that can be written as a product of independent densities

$\begin{displaymath}q(\boldsymbol{\theta}) = q(\theta_1)q(\theta_2)\dots q(\theta_N), \end{displaymath}$

(3.11)

where N is the number of parameters. This will simplify C_q in (

) to a sum of simple terms. Factorial approximation allows linear computational complexity.

The factorial approximation allows building of efficient algorithms, but it might not work well with all kinds of model structures. It does work well with models in which the posterior dependencies are not too strong, like in the sparse coding. It seems that almost maximally factorial $q(\boldsymbol{\theta})$ suffices for latent variable models, since the rotational and other invariances can be used by choosing a solution where the factorial approximation is most accurate. A good model structure seems to be more important than a good approximation of the posterior probability of the model.

Miskin and MacKay [49] used ensemble learning for blind source separation. They compared two approximations of the posterior: The first was an M-dimensional Gaussian with full covariance matrix, which resulted in a memory requirement of the order M² and a time complexity of the order M³. The second was the factorial approximation. They noticed that the factorial approximation is computationally more efficient and still gives a bound on the evidence and does not suffer from overfitting.

Next: Hierarchical Models Up: Ensemble Learning Previous: Ensemble Learning

Tapani Raiko
2001-12-10