The basic model is the same as the one presented in Section 4.1. The hidden state sequence is denoted by and other parameters by . The exact form of will be specified later. The observations , given the corresponding hidden state, are assumed to be Gaussian with diagonal covariance matrix.
Given the HMM state sequence , the individual observations are assumed to be independent. Therefore the likelihood of the data can be written as
Because of the Markov property, the prior distribution of the probabilities of the hidden states can also be written in factorial form:
The factors of Equations (5.1) and (5.2) are defined to be
(5.3) | ||
(5.4) | ||
(5.5) |
The priors of all the parameters defined above are
(5.6) | ||
(5.7) | ||
(5.8) | ||
(5.9) |
The parameters and of the Dirichlet priors are fixed. Their values should be chosen to reflect true prior knowledge on the possible initial states and transition probabilities of the chain. In our example of speech recognition where the states of the HMM represent different phonemes, these values could, for instance, be estimated from textual data.
All the other parameters and have higher hierarchical priors. As the number of parameters in such priors grows, only the full structure of the hierarchical prior of is given. It is:
(5.10) | ||
(5.11) | ||
(5.12) |
The hierarchical prior of for example can be summarised as follows:
The set of model parameters consists of all these parameters and all the parameters of the hierarchical priors.
In the hierarchical structure formulated above, the Gaussian prior for the mean of a Gaussian is a conjugate prior. Thus the posterior will also be Gaussian.
The parameterisation of the variance with , is somewhat less conventional. The conjugate prior for variance of a Gaussian is the inverse gamma distribution. Adding a new level of hierarchy for the parameters of such a distribution would, however, be significantly more difficult. The present parameterisation allows adding similar layers of hierarchy for the parameters of the priors of and . In this parameterisation the posterior of is not exactly Gaussian but it may be approximated with one. The exponential function will ensure that the variance will always be positive and the posterior will thus be closer to a Gaussian.