Next: The results
Up: Comparison with other models
Previous: Comparison with other models
  Contents
All the models used the same preprocessed data set as in
Section 7.1.2. The individual words were
processed separately in the preprocessing and all the dynamical models
were instructed to treat each word individually, i.e. not to make
predictions across word boundaries.
The models used in the comparison were:
- A continuous density HMM with Gaussian mixture observation
model. This was a simple extension to the model presented in
Section 5.1 that replaced the Gaussian
observation model with mixtures-of-Gaussians. The number of
Gaussians was the same for all the states but it was optimised by
running the algorithm with several values and using the best result.
The model was initialised with sufficiently many states and unused
extra states were pruned.
- The nonlinear factor analysis model, as presented in
Section 4.2.4. The model had 15 dimensional
factors and 30 hidden neurons in the observation MLP network.
- The nonlinear SSM, as presented in
Section 5.2. The model had 15 dimensional
state-space and 30 hidden neurons in both MLP networks.
- The switching NSSM, as presented in
Section 5.3. The model was essentially a
combination of the HMM and NSSM models except that it used Gaussian
observation model for the HMM.
The parameters of the HMM priors for the initial distribution
and the transition matrix were all set to ones.
This corresponds to a flat, noninformative prior. The choice does not
affect the performance of the switching NSSM very much. The HMM, on
the other hand, is very sensitive to the prior.
The data used with the plain HMM was additionally decorrelated with
principal component analysis (PCA) [27]. This
improved the performance a lot compared to the situation without the
decorrelation, as the prototype Gaussians were restricted to be
uncorrelated. The other algorithms can include the same
transformation to the output mapping so it was not necessary to do it
by hand. Using the nondecorrelated data has the advantage that it is
``human readable'' whereas the decorrelated data is much more
difficult to interpret.
Next: The results
Up: Comparison with other models
Previous: Comparison with other models
  Contents
Antti Honkela
2001-05-30