The SOM algorithm

Next: About the ordering and Up: Self-Organizing Map Previous: Self-Organizing Map

The SOM algorithm

The Self-Organizing Map [Kohonen, 1982,Kohonen, 1995,Kohonen et al., 1996a] is an unsupervised non-parametric regression process to represent nonlinear, high-dimensional data on a low-dimensional illustrative display. The input data points are mapped to SOM units on a usually one or two-dimensional grid. The mapping is learned from the training data samples by a simple stochastic learning process, where the SOM units are adjusted by small steps with respect to the feature vectors that are extracted from the data and presented one after another (in a random order).

The most familiar version of the algorithm [Kohonen, 1990b] uses the minimum Euclidean distance criterion for D-dimensional stochastic sample $\mbox{\boldmath$x$} \in {\cal R}^D$ to find the winner unit c (the best-matching unit, BMU) of the total of N SOM units. The criterion works so that the weight vector $\mbox{\boldmath$m$}_c$ of the BMU fulfills the condition

$\begin{displaymath} \vert\vert\mbox{\boldmath$x$} - \mbox{\boldmath$m$}_c\vert\v... ...ox{\boldmath$m$}_i\vert\vert \:,\: \forall i = 1, \ldots, N \:.\end{displaymath}$ (1)

The update of the SOM weights corresponding to $\mbox{\boldmath$x$}$ and c from the equation (1) is

$\begin{displaymath} \mbox{\boldmath$m$}_i(t+1) = \mbox{\boldmath$m$}_i(t) + h_{c... ... - \mbox{\boldmath$m$}_i(t)] \:,\: \forall i = 1, \ldots, N \:,\end{displaymath}$ (2)

where $t=0,1,\ldots \:$ is a discrete time index. The value of the neighborhood function h_ci(t) can be, for example, the learning rate $\alpha(t) \in ]0,1[ \:$ , if the array distance between $\mbox{\boldmath$m$}_i$ and $\mbox{\boldmath$m$}_c$ is smaller than the neighborhood radius r(t) and zero otherwise [Kohonen, 1995]. The desired form of convergence and characteristics of the result is ensured by the gradual decrease of $\alpha(t)$ and r(t) while t increases.

For faster convergence and simpler computations, the learning rate control can be eliminated by using the iterative batch adaptation [Kohonen, 1992,Kohonen, 1993,Kohonen, 1995,Mulier and Cherkassky, 1995]:

$\begin{displaymath} \mbox{\boldmath$m$}_i = \frac {\sum_{t=1}^{T} h_{ci} \mbox{\boldmath$x$}(t)} {\sum_{t=1}^{T} h_{ci} } \:,\end{displaymath}$ (3)

where each iterative weight modification is based on the averaged effect of a set of samples. Even if the set of samples used in the batch steps is large enough, it is important to note that the modification steps (3) must be iterated several times, because the choice of the BMU (index c in equation 3) depends on the current values of m_is.

Next: About the ordering and Up: Self-Organizing Map Previous: Self-Organizing Map

Mikko Kurimo
11/7/1997