Next: Reachable states Up: Statistical relational learning Previous: Combination rules Contents

Logical hidden Markov models

Logical hidden Markov models (LOHMMs) introduced in Publication VII, deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. They can be seen as a special case of statistical relational learning or as an extended version of hidden Markov models (HMMs, see Section 3.1.5). LOHMMs try to retain as much of the expressiveness of first-order logic as possible while still having as efficient algorithms for learning as HMMs. States and observations are logical atoms and transitions can involve variable bindings. LOHMMs have been implemented in Prolog.

Many real world sequences such as protein secondary structures or UNIX shell logs exhibit a rich internal structure. Traditional probabilistic models of sequences, however, consider sequences of unstructured symbols only. For instance, consider a sequence of UNIX commands, which may have parameters such as ``emacs lohmms.tex, ls, latex lohmms.tex''. Commands are essentially structured. Applying HMMs requires either 1) ignoring the structure of the commands (i.e., the parameters), or 2) taking all possible parameters explicitly into account. The former approach results in a serious information loss and the latter leads to a combinatorial explosion in the number of symbols and parameters of the HMM and as a consequence inhibits generalisation. Using logical atoms, the above UNIX command sequence can be represented as ``emacs(lohmms.tex), ls, latex(lohmms.tex)''. There are two important motivations for using logical atoms at the symbol level. Firstly, variables in the atoms allow one to make abstraction of specific symbols. For example, the logical atom emacs( $ X$ ) represents all files $ X$ edited using emacs. Secondly, unification allows one to share information among states. For example, the sequence emacs( $ X$ ), latex( $ X$ ) denotes that the same file is used as an argument for both emacs and latex.

Let us return to the coin example in Section 3.1.5 to help define a LOHMM as a generative model. Figure 6.2 shows the HMM and the corresponding LOHMM. Transitions are divided into two steps (compared to just one in HMMs). To generate data, first, a rule (or an abstract transition) is selected according to the (abstract) transition probabilities to find out the resulting abstract state (an atom such as coin( $ X$ )). Only the rules with most specific bodies are applied, for instance in the state coin(2), the rules for coin( $ X$ ) are not used. Second, the remaining variables in the atom are instantiated using so called selection probabilities. The scope of variables is restricted to single transitions, that is, abstract states coin( $ X$ ) and coin( $ Y$ ) are equivalent after variable bindings in that transition are taken into account. Note that also the observations ( $ h$ and $ t$ ) can be structured atoms in general.

**Figure 6.2:** *Left:* A graphical representation of a hidden Markov model repeated from Figure 3.3. *Right:*The corresponding logical hidden Markov model. *Bottom:* The logical hidden Markov model written as a logic programme. Solid arrows are abstract transitions, dashed arrows denote special cases, and dotted edges are needed because of the scope of variables. White-headed arrows in the bottom right show the selection probabilities.
$\includegraphics[width=0.7\textwidth]{coin_lohmm.eps}$

The rules of a LOHMM form a logic programme where the only fact is the start state. The proof network for the observation sequence forms a structure of the corresponding graphical model. Figure 6.3 depicts a graph formed by unfolding a tiny LOHMM in the UNIX command sequence modelling. Using the graph it is possible to see how to generalise inference algorithms of HMMs. As for HMMs, three inference problems are of interest. Let $\mathcal{H}$ be a LOHMM and let $\mathsf{X}=\mathtt{x}_1,\mathtt{x}_2,\ldots,\mathtt{x}_T$ , $ T>0$ , be a finite sequence of ground observations:

Evaluation:: Determine the probability $P(\mathsf{X}\mid \mathcal{H})$ that sequence $\mathsf{X}$ was generated by the model $\mathcal{H}$ .
Most likely state sequence:: Determine the hidden state sequence that has most likely produced the observation sequence $\mathsf{X}$ .
Parameter estimation:: Given a set $\{\mathsf{X}_1,\ldots,\mathsf{X}_k\}$ of observation sequences, determine the most likely parameters for the abstract transitions and the selection distributions of $\mathcal{H}$ .

Publication VII addresses each of these problems in turn by extending the existing solutions for HMMs. Belief propagation (see Section 3.1.1), also known as the forward-backward algorithm in the context of HMMs, is used to compute the probability of a particular abstract and ground transition at a particular time, given parameters. Belief propagation, as well as evaluation and finding the most likely hidden state sequence have the computational complexity of $\mathcal{O}(Ts^2)$ where $ T$

is the data size and $ s$

is the number of possible states.

**Figure 6.3:** A logical hidden Markov model is unfolded in time to form a trellis. Transitions are factorised into two steps, abstract transitions (rules) and selection (variable instantiation). The example represents user modelling where the states include commands ls, emacs, and latex, and the user type (t or o) is included as part of the hidden state. Filenames f1 and f2 are the other arguments of emacs and latex. See Publication VII for details.
$\includegraphics[width=0.8\textwidth]{lattice.eps}$

Probabilities found by belief propagation can be further used for parameter updating by summing over time to get expected counts for both abstract transitions and selections. Raiko et al. (2002) explain how this ML parameter estimation is transformed into a more Bayesian solution. The computational complexity is $\mathcal{O}(I(Ts^2+d))$ where $ I$ is the number of iterations and $ d$ is the number of parameters in the model. Experiments (see Section 6.2.3) demonstrate that LOHMMs possess several advantages over traditional HMMs for applications involving structured sequences.

Subsections

Next: Reachable states Up: Statistical relational learning Previous: Combination rules Contents

Tapani Raiko 2006-11-21