Rigorous mathematical treatment of the SOM algorithm has turned out to be extremely difficult in general (reviews have been provided by Kangas, 1994; and Kohonen, 1995c). In the case of a discrete data set and a fixed neighborhood kernel, however, there exists a potential function for the SOM, namely [Kohonen, 1991, Ritter and Schulten, 1988]
where the index c depends on the and the reference vectors (cf. Eq. 5).
The learning rule of the SOM, Equation 6, corresponds to a gradient descent step in minimizing the sample function
obtained by selecting randomly a sample at iteration t. The learning rule then corresponds to a step in the stochastic approximation of the minimum of Equation 7, as discussed by Kohonen (1995c).
Note: In Equation 7 the index c is a function of all the reference vectors, which implies that it may change when the gradient descent step is taken. Locally, if the index does not change for any , the gradient step is valid, however.