Level 3 (upper level) |
There are two schools about the interpretation of probability. In classical statistics, probability is interpreted as a limiting frequency when an experiment is repeated infinitely many times. For instance in throwing a dice, the probability of having three is one out of six (exactly so only if the dice is ideal).
In everyday language the probability is, however, understood is a wider sense. One can, for example, speak about the probability of rain tomorrow, even though the event is unique and there is no way its frequency could be measured by repeated experiments. Moreover, different people can give the same event different probability. This is natural since different people have different background knowledge and beliefs.
The interpretation of Bayesian probability theory is very close to everyday language. Probability expresses how strongly someone believes in something. Belief is always subjective and depends on background knowledge. Notation P(A | B) means: how true A seems if B is assumed. Often all the background knowledge is denoted and P(A) can thus mean different things depending on which background assumptions are used. It is good to remember, however, that according to Bayesian interpretation there is no absolute probability since there doesn't exist an absolutely correct set of background assumptions.
Sometimes the interpretation of probability has no effect on how
the actual computations are conducted or what is the result. For the
probabilities in dice throwing, for example, the interpretation has no
significance. However, from the point of view of learning and intelligent
systems, the difference in interpretation is significant.
Propositions, for which the probabilities are defined, obey the
rules of Boolean algebra. It is defined for elements which have two
binary operations, sum and product, and an unary operation,
complement, which will be denoted here by ¬. The set of axioms
defining the Boolean algebra is
Boolean algebra (George Boole 1854)
There exist elements 0 and 1, which are not equal. | [A1] | |
AB = BA | A+B = B + A | [A2] |
A(B+C) = (AB)+(AC) | A+(BC) = (A+B)(A+C) | [A3] |
1A = A | 0+A = A | [A4] |
A¬A = 0 | A+¬A = 1 | [A5] |
The axioms on the same row are dual. If the product and sum, and 0 and 1 are exchanged, one can transform between the dual axioms. Let's denote the axioms on the left hand column by a and right hand by b, i.e., A2b means the axiom AB = BA. From the axioms one can derive the following lemmas
¬¬A = A | [L1] | |
AA = A | A+A = A | [L2] |
¬1 = 0 | ¬0 = 1 | [L3] |
AB = 0 & A+B = 1 => B = ¬A | [L4] | |
0A = 0 | 1+A = 1 | [L5] |
A(A+B) = A | A+AB = A | [L6] |
A(BC) = (AB)C | A+(B+C) = (A+B)+C | [L7] |
¬A(AB) = 0 | ¬A+(A+B) = 1 | [L8] |
¬(AB) = ¬A+¬B | ¬(A+B) = ¬A¬B | [L9] |
AB = 1 => A = 1 | A+B = 0 => A = 0 | [L10] |
Boolean logic will be obtained when only the elements 0 and 1 are taken in the algebra. Zero is interpreted as false and one as truth. Product means the and, sum the or and complement the negation operation.
Sum Rule: P(A | B) + P(¬A | B) = 1
If one wishes to verify the truth of AB, one can first verify A and then verify B assuming A. Hence P(AB | C) is evidently a function of P(A | C) and P(B | AC). The product rule states that this function is a product.
Product Rule: P(AB | C) = P(A | C) P(B | AC)
Probability is a real number between zero and one. The probability is not defined if the background assumptions, premisses, conflict. P(A | B¬B), for example, is undefined.
Using the rules of arithmetics and Boolean algebra, all other rules of Bayesian probability theory can be derived from the sum and product rule. Let's take the derivation of the generalised sum rule for example. In what follows, the rule that will be applied is denoted at each step, unless only the rules of basic arithmetics are applied.
P(A+B | C) = | [L1] |
P(¬¬(A+B)) | C) = | [L7b] |
P(¬(¬A¬B) | C) = | [Sum Rule] |
1 - P(¬A¬B | C) = | [Product Rule] |
1 - P(¬A | C) P(¬B | ¬AC) = | [Sum Rule] |
1 - P(¬A | C) [1 - P(B | ¬AC)] = | |
1 - P(¬A | C) + P(¬A | C) P(B | ¬AC) = | [Sum Rule] |
P(A | C) + P(¬A | C) P(B | ¬AC) = | [Product Rule] |
P(A | C) + P(¬AB | C) = | [A2a] |
P(A | C) + P(B¬A | C) = | [Product Rule] |
P(A | C) + P(B | C) P(¬A | BC) = | [Sum Rule] |
P(A | C) + P(B | C) [1 - P(A | BC)] = | |
P(A | C) + P(B | C) - P(B | C) P(A | BC) = | [Product Rule] |
P(A | C) + P(B | C) - P(BA | C) = | [A2a] |
P(A | C) + P(B | C) - P(AB | C) |
Usually, of course, not all the intermediate results are presented. From the sum and product rule, also the equations P(1 | A) = 1 and P(A | B) > 0 => P(A | AB) = 1 can be derived. Let's denote x = P(1 | A). Then
1 - x = 1 - P(1 | A) = P(0 | A) = P(10 | A) = P(1 | A) P(0 | 1A) = x(1 - x) => x² - 2x + 1 = 0,
whose only solution is x = 1. On the other hand,
P(A | B) = P(AA | B) = P(A | B) P(A | AB),
Level 3 (upper level) |
Last updated 15.10.1998
Harri Lappalainen