In symbolic representations, the propositions are discrete and similar to simple statements of natural language. When trying to learn models of the environment, the problem with discrete propositions is that an unimaginable number of them is needed for covering all the possible states of the world. The alternative is to build models which have real valued variables. This allows one to manipulate a vast number of elementary propositions by manipulating real valued functions, probability densities.

Following the usual Bayesian convention, probability density is
denoted by a lower case *p* and the ordinary probability by a capital
*P* throughout this thesis. We also use the convenient short hand
notation where *p*(*x* | *y*) means the distribution of the belief in the
value of *x* given *y*. Alternative notation would be
*f*_{X|Y}(*x* |
*y*), which makes explicit the fact that *p*(*x* | *y*) is not the same
function as, for instance, *p*(*u* | *v*). In cases where the ordinary
probability needs to be distinguished from probability density, it is
called probability mass in analogy to physical mass and density.

Bayes' rule looks exactly the same for probability densities as it
does for probability mass. If *a* and *b* are real valued variables,
Bayes' rule takes the following form:

(9) |

This is convenient but also dangerous. It is all too easy to talk about ``the single most probable model'' when one is actually talking about the model which has the highest probability density. This is dangerous since probability density is a derived quantity and has no role

(10) |

or the rule of expected utility

(11) |

written for probability densities. Notice that for probability density the sum changes into an integral. In the integrals, the impact on the probability