While Bayes' rule specifies how the learning system should update its beliefs as new data arrives, the marginalisation principle provides for the derivation of probabilities of new propositions given existing probabilities. This is useful for prediction and inference.
Suppose the situation is the same as in the example with Bayes' rule, but now the learning system tries to compute the probability of making observation B before it has actually made the observation, that is, the learning system tries to predict the new observation.
Suppose
are exhaustive and mutually exclusive
propositions, in other words, exactly one of Ai is true while
the rest are false. As before, assume that Ai are possible
explanations for B and the prior assumptions and experience C are
such that both
P(B | Ai C) and
P(Ai | C) are determined. The
marginalisation principle then states the following:
(2) |
Notice also that P(B | C) appears in Bayes' rule, but the
marginalisation principle shows that it can be computed from
P(Ai |
C) and
P(B | Ai C) alone. Therefore
P(Ai | C) and
P(B | Ai
C) suffice for computing the posterior probability
P(Ai | BC):
(3) |