next up previous contents
Next: Approximations Up: Bayesian Probability Theory Previous: Bayes Rule

Marginalisation Principle

The marginalisation principle specifies how a learning system can predict or generalise. The probability of observing A with prior knowledge of B is

 \begin{displaymath}
p(A \mid B) = \int p(A \mid \boldsymbol{\theta}, B) p(\boldsymbol{\theta}\mid B) d\boldsymbol{\theta}.
\end{displaymath} (3.5)

It means that the probability of observating A can be acquired by summing or integrating over all different explanations $\boldsymbol{\theta}$. The term $p(A \mid \boldsymbol{\theta}, B)$ is the probability of A given a particular explanation $\boldsymbol{\theta}$ and it is weighted with the probability of the explanation $p(\boldsymbol{\theta}\mid B)$.

Using the principle, the evidence term can be written as

 \begin{displaymath}
p(\boldsymbol{X}\mid \mathcal{H}) = \int p(\boldsymbol{X}\m...
...) p(\boldsymbol{\theta}\mid \mathcal{H}) d\boldsymbol{\theta}.
\end{displaymath} (3.6)

This emphasises the role of the evidence term as a normalisation coefficient. It is an integral over the numerator of the Bayes rule ([*]).



Tapani Raiko
2001-12-10