The experimental setting

Next: The results Up: Comparison with other models Previous: Comparison with other models Contents

The experimental setting

All the models used the same preprocessed data set as in Section 7.1.2. The individual words were processed separately in the preprocessing and all the dynamical models were instructed to treat each word individually, i.e. not to make predictions across word boundaries.

The models used in the comparison were:

A continuous density HMM with Gaussian mixture observation model. This was a simple extension to the model presented in Section 5.1 that replaced the Gaussian observation model with mixtures-of-Gaussians. The number of Gaussians was the same for all the states but it was optimised by running the algorithm with several values and using the best result. The model was initialised with sufficiently many states and unused extra states were pruned.
The nonlinear factor analysis model, as presented in Section 4.2.4. The model had 15 dimensional factors and 30 hidden neurons in the observation MLP network.
The nonlinear SSM, as presented in Section 5.2. The model had 15 dimensional state-space and 30 hidden neurons in both MLP networks.
The switching NSSM, as presented in Section 5.3. The model was essentially a combination of the HMM and NSSM models except that it used Gaussian observation model for the HMM.

The parameters of the HMM priors for the initial distribution $u^{(\pi)}$ and the transition matrix $u^{(A)}$ were all set to ones. This corresponds to a flat, noninformative prior. The choice does not affect the performance of the switching NSSM very much. The HMM, on the other hand, is very sensitive to the prior.

The data used with the plain HMM was additionally decorrelated with principal component analysis (PCA) [27]. This improved the performance a lot compared to the situation without the decorrelation, as the prototype Gaussians were restricted to be uncorrelated. The other algorithms can include the same transformation to the output mapping so it was not necessary to do it by hand. Using the nondecorrelated data has the advantage that it is ``human readable'' whereas the decorrelated data is much more difficult to interpret.

Next: The results Up: Comparison with other models Previous: Comparison with other models Contents

Antti Honkela 2001-05-30