Next: PARAMETER ESTIMATION Up: BAYESIAN LEARNING OF LOGICAL Previous: LOGICAL HIDDEN MARKOV MODELS

LEARNING A LOHMM FROM DATA

This section describes how a LOHMM can be learned from data using the Bayesian approach.

Let us first separate a LOHMM into its parameters $\boldsymbol{\theta}$ (the probabilities) and the structure $\mathcal{H}$ (the rest). Let us also assume a prior which generates a structure $\mathcal{H}$ with probability $p(\mathcal{H})$ and parameters for it with probability density $p(\boldsymbol{\theta}\mid\mathcal{H})$ . Learning a single LOHMM $(\mathcal{H},\boldsymbol{\theta})$ from data $\boldsymbol{X}$ corresponds to finding a good representative of the posterior probability mass:

$\displaystyle p(\boldsymbol{\theta}, \mathcal{H}\mid \boldsymbol{X}) \propto p(... ...ol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H}) p(\mathcal{H}).$

(2)

Instead of finding just one LOHMM, one could use an ensemble of them represented with a set of sampled points [5] or a distribution with a simple form [7]. This is out of scope of this paper.

Finding a good representative structure $\mathcal{H}$ involves a combinatorial search in the structure space, where for each structure candidate, the parameters $\boldsymbol{\theta}$ need to be estimated. One could first try structures that resemble good candidates by using inductive logic programming [12] techniques. Also, the information from other structures can be used to guide the parameter estimation. This is further research.

The representative parameters $\boldsymbol{\theta}$ for a given structure $\mathcal{H}$ can be found using estimation. There are two commonly used estimators: the maximum a posteriori (map) estimate $\boldsymbol{\theta}^{map}$ and the Bayes estimate $\boldsymbol{\theta}^B$ . They are defined as

$\displaystyle \boldsymbol{\theta}^{map} = \arg \max_{\boldsymbol{\theta}} p(\bo... ...X}\mid \boldsymbol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H})$			(3)
$\displaystyle \boldsymbol{\theta}^B = \int p(\boldsymbol{X}\mid \boldsymbol{\th... ...p(\boldsymbol{\theta}\mid \mathcal{H}) \boldsymbol{\theta}d\boldsymbol{\theta}.$			(4)

The maximum likelihood estimator is a special case of the map estimator assuming the prior of the parameters $p(\boldsymbol{\theta}\mid\mathcal{H})$ to be uniform. Note that the Bayes estimate is unique whereas the map estimate is not.

One can also use the Bayes estimator componentwise (cB). Each component $\theta_k$ is estimated by

$\displaystyle \theta^{cB}_k = \int p(\boldsymbol{X}\mid \boldsymbol{\theta}, \mathcal{H}) p(\boldsymbol{\theta}\mid \mathcal{H}) \theta_k d\theta_k$

(5)

keeping all the other components constant. That is, $\boldsymbol{\theta}$ is $\boldsymbol{\theta}^{cB}$ with the

th component replaced by $\theta_k$ . The componentwise Bayes estimate is no longer unique.

Next: PARAMETER ESTIMATION Up: BAYESIAN LEARNING OF LOGICAL Previous: LOGICAL HIDDEN MARKOV MODELS

Tapani Raiko 2003-07-09