next up previous contents
Next: Some variants Up: Mathematical characterizations Previous: Relation to principal curves.

A decomposition of the cost function.

The cost function of the SOM, Equation 7, can be decomposed into two terms as follows (Lampinen and Oja, 1992; here a discrete version will be presented):

  equation402

Here tex2html_wrap_inline2229 denotes the number of the data items which are closest to the reference vector tex2html_wrap_inline2175 , and tex2html_wrap_inline2233 , where tex2html_wrap_inline2223 is the Voronoi region corresponding to the reference vector tex2html_wrap_inline2175 . When deriving this approximation is was assumed that tex2html_wrap_inline2391 for all i, which holds exactly for toroidal maps when the kernel h has the same shape for all i, and also away from the borders on a non-toroidal map if the kernel differs from zero only locally.

The first term in Equation 9 corresponds to the cost function of the K-means clustering algorithm, that is, the average distance from the data points to the nearest cluster centroid. Here the clusters are not defined in terms of their centroids, however, but in terms of the reference vectors tex2html_wrap_inline2175 . This first term may be interpreted as one way of measuring how accurately the map follows the distribution of the input data.

The second term, on the other hand, may be interpreted as governing the ordering of the reference vectors. In considering the second term it may be of help to note that tex2html_wrap_inline2401 and tex2html_wrap_inline2175 will in general be close to each other, since tex2html_wrap_inline2401 is the centroid of the cluster defined by tex2html_wrap_inline2175 . At a fixed point of the SOM algorithm tex2html_wrap_inline2409 , which is closer to tex2html_wrap_inline2401 the more the neighborhood kernel tex2html_wrap_inline2413 is centered around c. To minimize the second term the units that are close to each other on the map should have similar reference vectors, since the value of the neighborhood kernel is large. Units that lie farther away on the map may, on the other hand, have quite dissimilar reference vectors, since the neighborhood kernel is small and the distance thus does not make a large contribution to the error.


next up previous contents
Next: Some variants Up: Mathematical characterizations Previous: Relation to principal curves.

Sami Kaski
Mon Mar 31 23:43:35 EET DST 1997