next up previous contents
Next: Nonlinear state-space models Up: Learning algorithms Previous: Learning algorithms   Contents


Ensemble learning for hidden Markov models

MacKay showed in [39] how ensemble learning can be applied to learning of HMMs with discrete observations. With suitable priors for all the variables, the problem can be solved analytically and the resulting algorithm turns out to be a rather simple modification of the Baum-Welch algorithm.

MacKay uses Dirichlet distributions as priors for the model parameters $ \boldsymbol {\theta }$. (See Appendix A for the definition of the Dirichlet distribution and some of its properties.) The likelihood of the model is discrete with respect to all the parameters and the Dirichlet distribution is the conjugate prior of such a distribution. Because of this, the posterior distributions of all the parameters are also Dirichlet. The update rule for the parameters $ W_{ij}$ of the Dirichlet distribution of the transition probabilities is

$\displaystyle W_{ij} = \sum_{t=1}^{T-1} \sum_{\boldsymbol{M}} q(\boldsymbol{M}) \delta(M_t = i, M_{t+1} = j) + u_{a_{ij}}$ (4.11)

where $ u_{a_{ij}}$ are the parameters of the prior and $ q(\boldsymbol {M})$ is the approximate posterior used in ensemble learning. The update rules for other parameters are similar.

The posterior distribution of the hidden state probabilities turns out to be of exactly the same form as the likelihood in Equation (4.6) but with $ a_{ij}$ replaced with $ a_{ij}^* = \exp\left( \operatorname{E}_{q(\mathbf{A})} \{ \ln a_{ij} \} \right)$ and similarly for $ b_i(x(t))$ and $ \pi_i$. The required expectations over the Dirichlet distribution can be evaluated as in Equation (A.13) in Appendix A.


next up previous contents
Next: Nonlinear state-space models Up: Learning algorithms Previous: Learning algorithms   Contents
Antti Honkela 2001-05-30