next up previous contents
Next: Marginalisation Principle Up: Bayesian Probability Theory Previous: Bayesian Probability Theory

Bayes Rule

The Bayes rule

 \begin{displaymath}
p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})
= ...
...theta}\mid \mathcal{H}) }{ p(\boldsymbol{X}\mid \mathcal{H}) }
\end{displaymath} (3.3)

indicates how observations change the beliefs. It is named after an English reverend Thomas Bayes, who lived in the 18th century. $\mathcal{H}$marks the prior beliefs, X is the observed data and $\boldsymbol{\theta}$ is the parameter vector to be inferred. The term $p(\boldsymbol{\theta}\mid
\mathcal{H})$, the probability of the parameters given only the prior beliefs or the probability prior to the observations is called the prior probability. The term $p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H})$, the probability of the parameters given both the observations and the prior beliefs is called the posterior probability of the parameters. The Bayes rule tells how the prior probability is replaced by the posterior after getting the extra information i.e. the observed data. The term $p(\boldsymbol{X}\mid \boldsymbol{\theta}, \mathcal{H})$ is called the likelihood of the data and the term $p(\boldsymbol{X}\mid
\mathcal{H})$ the evidence of the data. Note that here $p(\cdot)$ stands for a probability density function (pdf) over a vector space.

When inferring the parameter values $\boldsymbol{\theta}$, the evidence term is constant and the learning rule ([*]) simplifies to

 \begin{displaymath}
p(\boldsymbol{\theta}\mid \boldsymbol{X}, \mathcal{H}) \pro...
...l{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H})
\end{displaymath} (3.4)

Bayes rule tells a learning system (the player) how to update its beliefs after observing X. Now there is no room for 'what if', since the posterior contains everything that can be inferred from the observations. Note that the learning cannot start from void - some prior beliefs are always necessary as will be discussed in Section [*].


next up previous contents
Next: Marginalisation Principle Up: Bayesian Probability Theory Previous: Bayesian Probability Theory
Tapani Raiko
2001-12-10