Kurimo, M., Using Self-Organizing Maps
and Learning Vector Quantization
for Mixture Density Hidden Markov Models. Acta
Polytechnica Scandinavica, Mathematics, Computing and Management in
Engineering Series No. 87, Espoo 1997, 57 pp.
Published by the Finnish Academy of Technology.
Keywords: Self-Organizing Map, SOM, Learning Vector Quantization, LVQ, Gaussians mixtures, density estimation, neural networks, speech recognition
Thesis for the degree of Doctor of Technology to be presented with
due permission for public examination and criticism in Auditorium F1
Helsinki University of Technology on the 3rd of October, at 12
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models.
A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vectors to nearby units, which is here exploited in experiments to improve the recognition speed of the system.
LVQ provides simple but efficient stochastic learning algorithms to improve the classification accuracy in pattern recognition problems. Here, LVQ is applied to develop an iterative training method for mixture density HMMs, which increases both the modeling accuracy of the states and the discrimination between the models of different phonemes. Experiments are also made with LVQ based corrective tuning methods for the mixture density HMMs, which aim at improving the models by learning from the observed recognition errors in the training samples.
The suggested HMM training methods are tested using the Finnish speech database collected in the Neural Networks Research Centre at the Helsinki University of Technology. Statistically significant improvements compared to the best conventional HMM training methods are obtained using the speaker dependent but vocabulary independent phoneme models. The decrease in the average number of phoneme recognition errors for the tested speakers have been around 10 percent in the applied test material.
© All rights reserved. No part of the publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the author.