The cost function of the SOM, Equation 7, closely resembles Equation 1, which the K-means clustering algorithm tries to minimize. The difference is that in the SOM the distance of each input from all of the reference vectors instead of just the closest one is taken into account, weighted by the neighborhood kernel h. Thus, the SOM functions as a conventional clustering algorithm if the width of the neighborhood kernel is zero.
The close relation between the SOM and the K-means clustering algorithm also hints at why the self-organized map follows rather closely the distribution of the data set in the input space: it is known for vector quantization that the density of the reference vectors approximates the density of the input vectors for high-dimensional data [Kohonen, 1995c, Zador, 1982], and K-means is essentially equivalent to vector quantization. In fact, an expression for the density of the reference vectors of the SOM has been derived in the one-dimensional case [Ritter, 1991]; in the limit of a very wide neighborhood and a large number of reference vectors the density is proportional to , where is the probability density function of the input data.
Note: Although the K-means clustering algorithm and the SOM are very closely related, the best ways of using them in data mining are probably different. Whereas in the K-means clustering algorithm the number K of clusters should be chosen according to the number of clusters there are in the data, in the SOM the number of reference vectors can be chosen to be much larger, irrespective of the number of clusters. The cluster structures will become visible on the special displays that were discussed in Section 6.4.2.