In symbolic representations, the propositions are discrete and similar to simple statements of natural language. When trying to learn models of the environment, the problem with discrete propositions is that an unimaginable number of them is needed for covering all the possible states of the world. The alternative is to build models which have real valued variables. This allows one to manipulate a vast number of elementary propositions by manipulating real valued functions, probability densities.
Following the usual Bayesian convention, probability density is denoted by a lower case p and the ordinary probability by a capital P throughout this thesis. We also use the convenient short hand notation where p(x | y) means the distribution of the belief in the value of x given y. Alternative notation would be fX|Y(x | y), which makes explicit the fact that p(x | y) is not the same function as, for instance, p(u | v). In cases where the ordinary probability needs to be distinguished from probability density, it is called probability mass in analogy to physical mass and density.
Bayes' rule looks exactly the same for probability densities as it
does for probability mass. If a and b are real valued variables,
Bayes' rule takes the following form: