Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
In speech-to-text transformation the speech units are usually modeled by hidden Markov models (HMM). An HMM consists of successive states that cannot be directly observed from the signal, but they produce statistically different features. Thus the probability of being in a state at a given moment can be computed from a sequence of features, if the characteristics of each state and the state transition probabilities are known. This thesis work includes studies on the application of artificial neural network (ANN) methods to estimate the parameters of HMMs by learning them from recorded speech samples. The ANNs used here are the Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) developed by prof. Kohonen. The density of an HMM state can be represented rather accurately by weighted mixtures of SOM units without knowing the type of the original feature density. The stochastic learning law of LVQ can be applied to improve the classification accuracy of the speech units by using the training data to discriminate between the models. The results from experiments on speaker dependent, but vocabulary independent phoneme models show that the HMM training method developed in this work provides about 10 percent average decrease of recognition errors compared to the best conventional HMM training methods.