The performed experiments indicate that SOM and LVQ principles are useful when the parameters of the rather complicated ASR system must be learned from the collected speech samples. The decrease in the average number of phoneme recognition errors for the tested speakers have been around 10 % in the applied test material: For example in Publication 6, the error rate decreased from 5.4 % to 4.8 %. The average error rate improvements are not very dramatic compared to the conventional training methods, but since the application has so much potential and importance as the ASR and the performance of the system is the main restrictive factor, even small improvements can have significant consequences.
For the future development of the HMMs the modeling of the duration of the states should perhaps be more explicit and the variation in the state output could perhaps be better treated by using application dependent knowledge rather than just increasing the number of units in the density models. For speech input beneficial extensions might follow from the use of diphone modeling or from allowing branches and skips in the HMM structure. Important future aspects for improvements of the ASR system include speaker independence or speaker adaptation, tolerance for unexpected disturbances and changes, and robustness to faster and more careless pronunciation. Also the applicability to other languages and the integration of context knowledge into the classification decisions would be interesting topics.