next up previous
Next: PARAMETER ESTIMATION Up: BAYESIAN LEARNING OF LOGICAL Previous: LOGICAL HIDDEN MARKOV MODELS


LEARNING A LOHMM FROM DATA

This section describes how a LOHMM can be learned from data using the Bayesian approach.

Let us first separate a LOHMM into its parameters $ \boldsymbol{\theta}$ (the probabilities) and the structure $ \mathcal{H}$ (the rest). Let us also assume a prior which generates a structure $ \mathcal{H}$ with probability $ p(\mathcal{H})$ and parameters for it with probability density $ p(\boldsymbol{\theta}\mid\mathcal{H})$. Learning a single LOHMM $ (\mathcal{H},\boldsymbol{\theta})$ from data $ \boldsymbol{X}$ corresponds to finding a good representative of the posterior probability mass:

$\displaystyle p(\boldsymbol{\theta}, \mathcal{H}\mid \boldsymbol{X}) \propto p(...
...ol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H}) p(\mathcal{H}).$ (2)

Instead of finding just one LOHMM, one could use an ensemble of them represented with a set of sampled points [5] or a distribution with a simple form [7]. This is out of scope of this paper.

Finding a good representative structure $ \mathcal{H}$ involves a combinatorial search in the structure space, where for each structure candidate, the parameters $ \boldsymbol{\theta}$ need to be estimated. One could first try structures that resemble good candidates by using inductive logic programming [12] techniques. Also, the information from other structures can be used to guide the parameter estimation. This is further research.

The representative parameters $ \boldsymbol{\theta}$ for a given structure $ \mathcal{H}$ can be found using estimation. There are two commonly used estimators: the maximum a posteriori (map) estimate $ \boldsymbol{\theta}^{map}$ and the Bayes estimate $ \boldsymbol{\theta}^B$. They are defined as

$\displaystyle \boldsymbol{\theta}^{map} = \arg \max_{\boldsymbol{\theta}} p(\bo...
...X}\mid \boldsymbol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H})$     (3)
$\displaystyle \boldsymbol{\theta}^B = \int p(\boldsymbol{X}\mid \boldsymbol{\th...
...p(\boldsymbol{\theta}\mid \mathcal{H}) \boldsymbol{\theta}d\boldsymbol{\theta}.$     (4)

The maximum likelihood estimator is a special case of the map estimator assuming the prior of the parameters $ p(\boldsymbol{\theta}\mid\mathcal{H})$ to be uniform. Note that the Bayes estimate is unique whereas the map estimate is not.

One can also use the Bayes estimator componentwise (cB). Each component $ \theta_k$ is estimated by

$\displaystyle \theta^{cB}_k = \int p(\boldsymbol{X}\mid \boldsymbol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H}) \theta_k d\theta_k$     (5)

keeping all the other components constant. That is, $ \boldsymbol{\theta}$ is $ \boldsymbol{\theta}^{cB}$ with the $ k$th component replaced by $ \theta_k$. The componentwise Bayes estimate is no longer unique.


next up previous
Next: PARAMETER ESTIMATION Up: BAYESIAN LEARNING OF LOGICAL Previous: LOGICAL HIDDEN MARKOV MODELS
Tapani Raiko 2003-07-09