The relative performance of the different algorithms in reducing the dimensionality of the input data has been studied in several articles [Bezdek and Pal, 1995, Flexer, 1997, Goodhill et al., 1995, Kraaijveld et al., 1992, Kraaijveld et al., 1995, Mao and Jain, 1995]. In these studies the measures of performance were, however, mostly related to the preservation of the distances between points. In none of the studies has a cost function of the SOM-type (Eq. 7) been considered although for example the cost function of the Sammon's mapping (Eq. 4) has been used. It may not be surprising that Sammon's mapping is a good method according to that measure. Goodhill et al. (1995), on the other hand, define the preservation of neighborhoods as the preservation of distance ordering, which is the goal of nonmetric MDS.
The approach of comparing pattern classification capability in the original and the mapped space [Kraaijveld et al., 1992, Kraaijveld et al., 1995] might be a valid measure of feature extraction performance, but unfortunately it cannot be used for unlabeled data sets.
Since the methods pursue different although related goals, a more productive approach than trying to quantify their relative performance may be to analyze their differences qualitatively. Below, a short nonformal delineation of the differences between the SOM and the other methods is presented, based on the differences of their cost functions.
The relation between the SOM and clustering methods on one hand, and the nonlinear distance-preserving projection methods on the other, is perhaps best revealed by the decomposition of the SOM cost function, Equation 9. The SOM both performs clustering (first term in Eq. 9) and organizes the clusters (second term in Eq. 9); it can thus be used, at the same time, both for reducing the amount of data and for projecting it onto a low-dimensional display.
The essential difference between the projection formed by the SOM and the nonlinear distance-preserving projection methods (MDS) seems to be that the SOM tries to form a locally correct projection, whereas the MDS methods try to directly preserve all interpoint distances. This applies also to nonmetric MDS; although it only tries to preserve the rank order of the distances, it nevertheless treats all distances equally.
Sammon's mapping emphasizes the preservation of the local distances. The SOM does not, however, try to preserve the distances, but instead forms a kind of a homeomorphism, an ``order-preserving nonlinear regression'' of the map grid into the data set. The order is determined by the neighborhood kernel. If the kernel is localized (narrow), the order is determined locally. Global order arises because of the local interactions and because the gradual narrowing of the neighborhood kernel aids in avoiding ``unordered'' configurations.
In the general case a high-dimensional space cannot of course be accurately mapped onto a low-dimensional one by any method. In such cases the results of the SOM and those of the distance-preserving projection methods may be quite different. Since it is not possible to preserve all distances, it is quite possible that the MDS methods preserve most distances approximately and some distances poorly. Especially in the metric MDS the long distances will dominate over the shorter, local ones. In the SOM, in contrast, the localized neighborhood kernel determines that only the local distances will contribute to the error function.
In terms of exploratory data analysis the difference might be interpreted as follows: in the MDS methods the coarse global order of the projected data items will probably be more accurate and they, by definition, provide a more accurate view of the distances between the data items. The SOM, in contrast, tries to guarantee that items projected to nearby locations are similar, whereby the local order and local clustering structures shown on the map display are always as trustworthy as possible. Global structures will usually be displayed as well, but the local ones are considered to be more important.
The difference therefore seems to be that MDS tries to preserve the metric (or, in the case of nonmetric MDS, the global ordering relations) of the original space, whereas the SOM tries to preserve the topology, i.e., the local neighborhood relations. Sammon's mapping lies in the middle: it tries to preserve the metric, but considers preservation of local structures more important than the other MDS methods do.
It may be hard to define topology preservation rigorously for finite discrete structures, but it may still be possible to construct viable practical measures of how well the neighborhood relations are preserved. Such measures are discussed in more detail in Section 7.3 and in Publication 7.
The difference between methods that preserve distances and methods that preserve topology can perhaps be clarified through a hypothetical experiment with a data set consisting of a curved two-dimensional surface in a three-dimensional space. Distance-preserving methods require three dimensions to describe the structure of the data adequately. Topology-preserving methods like the SOM, on the other hand, only need two dimensions. The ``elastic network'' that the map forms can follow the curved surface in the data space.