LVQ for training vector quantization codebooks

Next: Corrective tuning based on Up: LVQ for MDHMMs Previous: Motivation

LVQ for training vector quantization codebooks

The first idea concerning the integration of the LVQ into the MDHMM training scheme was to provide the HMM states with VQ codebooks which would already efficiently discriminate between the phoneme labels of individual feature vectors extracted from the pre-segmented speech samples. This means that LVQ was used in the initialization phase (see Figure 3). The aim was that the maximum likelihood training of the HMMs would then converge rapidly to a solution of high discrimination ability.

In [Kurimo and Torkkola, 1992b,Kurimo and Torkkola, 1992a] experiments were presented for the LVQ initialization of the CDHMMs. The output densities of the states were mixtures of 25 Gaussians that were unique for each state, but shared the same common diagonal covariance matrix. The results revealed that the LVQ codebooks tend to lead to models of lower error rates than the conventional K-means codebooks both in the Baum-Welch and Viterbi training [Rabiner, 1989]. For the Viterbi training the LVQ initialization seemed also to be slightly better than the SOM initialization.

In Publication 1 the LVQ codebooks were generated for the SCHMMs where all states for every phoneme share the same large codebook of Gaussians. The difficulty in such an approach is the definition of which units represent the correct and incorrect phoneme classes since every unit may participate the density estimation for every state. The solution experimented in Publication 1 borrowed the idea of the data based majority voting used commonly in labeling of SOM units, e.g. see [Kohonen, 1995]. Despite some possible problems caused by the variation of the number of labels per phoneme classes, the LVQ initialization experiments showed lower error rates compared to using only SOMs or K-means.

For phoneme-wise tied MDHMMs (Publication 3) there exist no problems with the mixture labeling since same mixtures are not used for different phonemes. The small phoneme-wise codebooks can be very easily trained first by SOM, and then for LVQ the codebooks can be concatenated into one. Anyhow, as concluded from the experiments in Publication 6, the SOM initialization alone seems to be sufficient. The discriminative training has more effect when embedded into the actual HMM training phase (see Figure 3) to constitute the new segmental LVQ3 method (see Section 3.3.4).

Next: Corrective tuning based on Up: LVQ for MDHMMs Previous: Motivation

Mikko Kurimo
11/7/1997