The relations and mutual differences of the segmental LVQ3 and other corrective training algorithms for HMMs are discussed in Publication 6 with emphasis on the direct error corrective training algorithms [Bahl et al., 1988,Mizuta and Nakajima, 1990], and the application of GPD [Chou et al., 1992].
Because of some observed weaknesses in the traditional HMM training methods, which, however, have to a large extent been eliminated from the newer models, there has been suggested a wide range of other training algorithms, more or less related to the discriminative training. Some of them have also gained wide acceptance, such as the maximum mutual information (MMI) training [Bahl et al., 1986,Kapadia et al., 1993] and the maximum a posteriori probability (MAP) training [Lee et al., 1990,Gauvain and Lee, 1994,Huo et al., 1995].
Both of MMI and MAP attempt to increase the posteriori probability of the model sequence for the training data. The update rules can be presented as modifications of the present parameter values by an additional term obtained from the current iteration. Thus corrective training versions of both of the algorithms can also be derived. In [Ljolje et al., 1990] the MMI criterion is shown to emerge from a direct approximation of the empirical error rate. Compared to the simple discriminative training experimented with in this work, MMI and MAP involve considerably more effort in computing the parameter adjustments, but to demonstrate the practical gain no rigorous experimental study for characteristics of all the major methods at different conditions has yet been given, unfortunately.