Relation to principal curves.

Next: A decomposition of the Up: Mathematical characterizations Previous: Relation to K-means clustering.

Relation to principal curves.

The SOM algorithm creates a representation of the input data set that follows the data distribution. The representation of the data set is also organized. One possible view (cf. Ritter et al., 1992) to the organization has been provided by the mathematical characterization of principal curves [Hastie and Stuetzle, 1989].

Each point on a principal curve is the average of all the points that project to it. The curve is thus formed of conditional expectations of the data. For discrete distributions it is not sensible to define such curves, but it is possible to construct practical algorithms if the data is ``smeared'' spatially. In the SOM a similar smearing is performed by the neighborhood kernel in the adaptation process (cf.\ Eq. 6). It is well-known in SOM literature that in the SOM each reference vector represents local conditional expectations of the data items -- the batch map algorithm [Kohonen, 1995c] is essentially a manifestation of this idea. The principal curves or manifolds are thus essentially continuous counterparts of the SOM.

The principal curves also have another characterization which, based on the previous discussion, may be used as one source in providing an intuitive understanding of the SOM algorithm. The goodness of a curve in representing a data distribution can be measured, for example, by the average (squared) distance of the data points from the curve, like the goodness of the K-means algorithm was measured by the average (squared) distance of the data points from the nearest cluster. The principal curves are the critical points of this measure, i.e., they are extremal with respect to small, smooth variations [Hastie and Stuetzle, 1989]. This implies that any smooth curve which corresponds to a minimum of the average distance from the data items is a principal curve.

Sami Kaski
Mon Mar 31 23:43:35 EET DST 1997