The task of learning the parameters of a model means that given a set
of data cases or observations
and a model structure
, one can infer the distribution over the
model parameters
, found for instance in the conditional
probability table in Table 3.1, in the clique potentials
in Equation (3.13), in the mapping
in Equation (3.14), or the transition
probabilities in Figure 3.3. Parameter learning does
not differ from inference in Bayesian probability theory, so the
reasons for studying them separately are mostly practical. For
instance in the EM algorithm, the updates of parameters and latent
variables are done separately and in different ways. Also, local update
rules work efficiently in parameter learning, whereas explicit
propagation of information is important in state inference of dynamic
systems (see Publication V).
Parameter learning can be really simple. Consider learning the values
in the conditional probability table for neighbour 1 calling about the
alarm in case there is an alarm or not,
in
Table 3.1. Given data samples where we observe whether
there was an alarm and whether the neighbour called or not, let us
settle for a point estimate: the most likely set of parameters. The
ML solution is simply to count how many times each of the four cases appear in
the data and turn them into probabilities by normalising each row to
sum to one.