Measuring the misclassification.

Next: Why LVQ? Up: Minimization of classification errors Previous: Minimization of classification errors

Measuring the misclassification.

In LVQ and stochastic gradient descent algorithms the number of expected classification errors is minimized using misclassifications or near-misses observed in stochastic training samples. For the small model adjustment steps corresponding to each such sample, some continuous measure for the degree of misclassification is useful. For two-class problems the measure is based on the difference between the Bayesian class discriminants g_i defined using the distances to the closest references from the competing classes. For example, the discriminant for class i can be simply

$\begin{displaymath} g_i(\mbox{\boldmath$x$}) = \vert\vert\mbox{\boldmath$x$} - \mbox{\boldmath$m$}_c\vert\vert^{-2} \:,\end{displaymath}$

(10)

where the index c of the BMU (1) is separately determined among the set of references of each class. It is also possible to use an appropriately weighted sum of contributions of several closest references or reference sequences as in [Chang and Juang, 1992,McDermott and Katagiri, 1994].

For decisions between multiple classes (M) there exist several possibilities. In [Juang and Katagiri, 1992] the misclassification of $\mbox{\boldmath$x$} \in C_k$ is measured by

$\begin{displaymath} d_k(\mbox{\boldmath$x$}) = -g_k(\mbox{\boldmath$x$}) + [\fr... ...\sum_{i,i \neq k}^M g_i(\mbox{\boldmath$x$})^\eta]^{1/\eta} \:,\end{displaymath}$ (11)

which is a continuous extension of the one used in [Amari, 1967], where only the discriminant differences between the correct class and the confusing classes (g_i > g_k) are averaged. In (12) $\eta$ is a positive number to control the relative effect between larger and smaller discriminants.

In [Juang and Katagiri, 1992] definition (12) is used to define a framework of discriminative training methods called GPD (Generalized Probabilistic Descent) to minimize the expected cost of the misclassifications. In the extreme case of (12), where $\eta \rightarrow \infty$ , only the largest discriminant (class C_i) affects the misclassification measure

$\begin{displaymath} d_k(\mbox{\boldmath$x$}) = -g_k(\mbox{\boldmath$x$}) + \max_{i,i \neq k} g_i(\mbox{\boldmath$x$}) \:.\end{displaymath}$ (12)

This approach leads to the LVQ2 algorithm [Komori and Katagiri, 1992]. In [Katagiri et al., 1991] it is shown that the practically efficient simple LVQ2 actually does approximate well the more complicated gradient search GPD implementations.

Next: Why LVQ? Up: Minimization of classification errors Previous: Minimization of classification errors

Mikko Kurimo
11/7/1997