Logical hidden Markov models (LOHMMs) introduced in Publication VII, deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. They can be seen as a special case of statistical relational learning or as an extended version of hidden Markov models (HMMs, see Section 3.1.5). LOHMMs try to retain as much of the expressiveness of first-order logic as possible while still having as efficient algorithms for learning as HMMs. States and observations are logical atoms and transitions can involve variable bindings. LOHMMs have been implemented in Prolog.
Many real world sequences such as protein secondary structures or
UNIX shell logs exhibit a rich internal structure. Traditional
probabilistic models of sequences, however, consider sequences of
unstructured symbols only.
For instance, consider a sequence of UNIX commands, which may have parameters
such as ``emacs lohmms.tex, ls, latex lohmms.tex''.
Commands are essentially structured. Applying HMMs requires either 1)
ignoring the structure of the commands (i.e., the parameters), or 2)
taking all possible parameters explicitly into account. The former
approach results in a serious information loss and the latter leads to a
combinatorial explosion in the number of symbols and parameters of the
HMM and as a consequence inhibits generalisation.
Using logical atoms, the above UNIX command sequence can be represented
as ``emacs(lohmms.tex), ls, latex(lohmms.tex)''. There are two important motivations for
using logical atoms at the symbol level. Firstly, variables in the atoms allow one to make
abstraction of specific symbols. For example, the logical atom emacs() represents all files
edited using emacs. Secondly, unification allows one to share information
among states. For example, the sequence emacs(
), latex(
) denotes that the
same file is used as an argument for both emacs and latex.
Let us return to the coin example in Section 3.1.5 to help
define a LOHMM as a generative model. Figure 6.2 shows the HMM and the
corresponding LOHMM. Transitions are divided into two steps (compared
to just one in HMMs). To generate data, first, a rule (or an abstract transition) is
selected according to the (abstract) transition probabilities to find
out the resulting abstract state (an atom such as
coin()). Only the rules with most specific bodies are applied, for
instance in the state coin(2), the rules for coin(
) are not used.
Second, the remaining variables in the atom are instantiated using so
called selection probabilities. The scope of variables is restricted
to single transitions, that is, abstract states coin(
) and
coin(
) are equivalent after variable bindings in that transition
are taken into account. Note that also the observations (
and
) can
be structured atoms in general.
![]() |
The rules of a LOHMM form a logic programme where the only fact is the
start state. The proof network for the observation sequence
forms a structure of the corresponding graphical model.
Figure 6.3 depicts a graph formed by unfolding a
tiny LOHMM in the UNIX command sequence modelling.
Using the graph it is possible to see how to generalise inference algorithms of HMMs.
As for HMMs, three inference problems are of interest.
Let
be a LOHMM and let
,
, be a
finite sequence of ground observations:
![]() |
Probabilities found by belief propagation can be further
used for parameter updating by summing over time to get expected
counts for both abstract transitions and
selections. Raiko et al. (2002) explain how this ML parameter estimation is
transformed into a more Bayesian solution. The computational
complexity is
where
is the number of
iterations and
is the number of parameters in the model.
Experiments (see Section 6.2.3) demonstrate that
LOHMMs possess several advantages over traditional HMMs for
applications involving structured sequences.