Summary: learning, reasoning and action

Next: BAYESIAN LEARNING IN PRACTICE Up: BAYESIAN PROBABILITY THEORY Previous: Decision theory

Summary: learning, reasoning and action

We can now summarise what Bayesian probability theory and decision theory say about learning, reasoning and action by giving a simple example. Suppose there are prior assumptions and experience I and possible explanations expressed as states of the world S_i. An observation D is made and an action A_j is chosen based on the belief about what is the consequence D' of the action. We assume D' is one of several possible observations D'_k made after the action is chosen.

The prior assumptions and experience I are assumed to be such that it is possible to determine the prior probability P(S_i | I) of each state of the world; the probability P(D | S_i I) of observation Dgiven the state of the world S_i; the probabilities P(D'_k | S_i A_j D I) of different consequences of actions given the state of the world and prior experience; and the utility of the consequences U(A_j D'_k D I). The action A_j is assumed to have no effect on the state S_i of the world and thus P(S_i | A_j D I) = P(S_i | D I).

The first stage of the example is learning. First the states of the world have prior probabilities P(S_i | I). After making the observation D, the probabilities change according to Bayes' rule:

$\begin{displaymath}P(S_i \vert D I) = \frac{P(S_i \vert I) P(D \vert S_i I)}{\sum_{i'} P(S_{i'} \vert I) P(D \vert S_{i'} I)}. \end{displaymath}$

(6)

The belief in those states of the world which were able to predict the observation better than average increases, and vice versa.

The next stage is to infer which consequences different actions have. According to the marginalisation principle,

$\begin{displaymath}P(D'_k \vert A_j D I) = \sum_i P(S_i \vert A_j D I) P(D'_k \vert S_i A_j D I). \end{displaymath}$

(7)

Notice that A_j was assumed to have no effect on S_i and thus P(S_i | A_j D I) is equal to the posterior probability P(S_i | D I) which was computed in the first stage.

The third stage of the example is choosing an action which has the greatest utility. The utilities can be computed by the rule of expected utility:

$\begin{displaymath}U(A_j D I) = \sum_k P(D'_k \vert A_j D I) U(A_j D'_k D I). \end{displaymath}$

(8)

The utilities of actions are based on the utilities of consequences and the probabilities of consequences in light of the experience, which were computed in the previous stage.

So far we have explicitly denoted that the probabilities are conditional to the prior assumptions and experience I. In most cases the context will make it clear which are the prior assumptions and usually I is left out. This means that probability statements like P(S_i) should be understood to mean P(S_i | I) where Idenotes the assumptions appropriate for the context.

Next: BAYESIAN LEARNING IN PRACTICE Up: BAYESIAN PROBABILITY THEORY Previous: Decision theory

Harri Valpola
2000-10-31