Most data analysis methods are based on developing a *model* that
could be used to recreate the studied data set. Speech recognition
systems, for example, are often built around a model that could in
principle be used as a speech generator. The success of the
recogniser depends heavily on how well the generator can generate
realistic speech data.

The speech generators used by most modern speech recognition systems
are based on the *hidden Markov model* (HMM). The HMM is a
*discrete* model. It has a finite number of different
*internal states* that produce different kind of output.
Typically there are a couple of states for each phoneme or a pair of
phonemes. The whole dynamical process of producing speech is thus
modelled by discrete transitions between the states corresponding to
the different phonemes.

The model of human speech implied by the HMM is not a very realistic
one. The dynamics of the mouth and the vocal cord used to produce the
speech are continuous. The discrete model is only a very crude
approximation of the ``true'' model.
A more realistic approach would be to model the data with a continuous
model. The process of producing speech is clearly nonlinear and this
should be reflected by its model. A good candidate for the task is
the *nonlinear state-space model* (NSSM). The NSSM can be
described as the continuous counterpart of the HMM. The problem with
models like the NSSM is that they concentrate on modelling the
short-term structure of the data. Therefore they are not as such very
well suited for speech recognition.

There are speech recognition systems that try to get the best of the both worlds by combining the two different kinds of models into one hybrid structure. Such systems have performed well in several difficult real world problems but they are often rather specialised. The training algorithms for such models are usually based on some heuristic measures rather than on generally accepted mathematical principles.

In this work, a hybrid model structure that combines the HMM with
another dynamical model, the continuous NSSM, is studied. The
resulting model is called the *switching nonlinear state-space
model* (switching NSSM). The resulting hybrid model has the power
of a continuous NSSM to model the short-term dynamics of the data.
However, above the NSSM there is still the familiar HMM to divide the
data to different discrete states corresponding, for example, to the
different phonemes.