There exist several proofs for the convergence of the traditional discriminant methods in two-class problems with separable pattern distributions [Nilsson, 1965,Amari, 1967]. However, equally strong proofs have not been given for the multiclass tasks, very common in practice, where the pattern distributions are non-separable and of unknown functional form. Some probabilistic gradient methods can be shown to approach on the average the optimal classification solutions for an infinite amount of training data, but there may exist difficulties with small training sets. Furthermore, the performance is often measured only in a particular independent test data set so that the theoretical convergence properties do not necessarily coincide well with practical results.
In the LVQ learning laws [Kohonen, 1995] much emphasis has been given to the average performance in different difficult practical classification experiments in order to ensure that the methods will actually work in various contexts. The classification has been determined according to the class of the nearest reference vector as explained in the previous section. The selection of the nearest reference and the parameter adjustment formulas are normally expressed using the Euclidean distance metric, but corresponding expressions can be derived for other metrics that may be more suitable for certain applications [Katagiri and Lee, 1993].