Some variants

Next: Notes on statistical accuracy Up: Self-organizing maps Previous: A decomposition of the

Some variants

A rich variety of versions of the basic SOM algorithm have been proposed (cf. Kohonen, 1995c). Some of the variants aim at improving the preservation of topology by using more flexible map structures instead of the fixed grid [Blackmore and Miikkulainen, 1993, Bruske and Sommer, 1995, Fritzke, 1991, Fritzke, 1994, Martinetz and Schulten, 1991, Martinetz and Schulten, 1994]. While some of these methods may be useful for other purposes, they cannot be used for visualization, at least not as easily as the regular grid.

Some other variants aim at reducing the computational complexity of the SOM. The speed of computation is an extremely important aspect in data mining when vast databases are analyzed. In Publication 4 we used computational speedups that have been developed by Kohonen; in the other studies the basic SOM algorithm was used. Other speedup methods have also been proposed. For example, a one-dimensional map can be enlarged simply by inserting a reference vector between each pair of neighbors [Luttrell, 1988].

The search for the best-matching unit can be speeded up by constructing a tree-structured SOM (Koikkalainen, 1994; Koikkalainen, 1995; Koikkalainen and Oja, 1990), where each level of the tree consists of a separate, progressively larger SOM. The search for the best match then proceeds level by level, at each time restricting the search to a subset of units that is governed by the location of the best match in the previous, smaller level. The map is taught one level at a time, starting from the smallest level. During teaching the best match search can be done even more quickly if the data set is relatively small: the location of the best match in the previous level can be tabulated for each input sample. The subset of units in which the search for a winner needs to be performed can then be found by a single table lookup.

Yet another fast approach is to construct a hierarchical tree of SOMs, where the SOMs in the bottom layer treat different subsets of the components of the input variables. The outputs of the SOMs in the bottom layer are then combined in a hierarchical manner towards the final SOM that takes into account the whole input vector [Luttrell, 1989]. Since each of the small SOMs receives only very small-dimensional input vectors, the winning unit can potentially be sought with a single, fast table lookup.

Each of the computational speedups could potentially be useful, but the comparison of the methods would require a careful study. Most of the truly effective speedups are suboptimal in the sense that they only approximate the original SOM, and a careful study is therefore needed to guarantee that the results are satisfactory for a given application.

Bishop et al. (1996a, 1996b) have recently introduced a latent-variable density model called Generative Topographic Mapping (GTM), which is closely related to the SOM and to principal curves. In latent variable models it is assumed that the data set can be explained using a small set of latent variables. In principle the latent variable space could be continuous, but in GTM a discrete grid like the grid of the SOM is used for computational reasons. A probability distribution on the latent grid is postulated to generate a probability distribution in the input space through a parameterized model. Given an input data set, the goal of the GTM is then to fit the model, in the maximum likelihood sense, to the data set. This is done using the expectation-maximization (EM) algorithm [Dempster et al., 1977], by iteratively estimating first the probability distribution on the latent grid and then maximizing the likelihood of the input data, given the distribution.

The GTM is claimed to have several advantageous properties when compared with the SOM, and no significant disadvantages [Bishop et al., 1996b]. It may be useful to study these properties more closely to get a deeper understanding of the relations and relative merits of the methods.

The GTM has an objective function, which may facilitate theoretical analyses of the method. The SOM does not have an objective function if the input data distribution is continuous [Erwin et al., 1992]. In practical applications the data sets are always finite, however, and the SOM does have a (local) objective function, Equation 7.

The methods differ in that the GTM does not have a neighborhood function which governs the smoothness of the mapping in the SOM algorithm. In GTM the smoothness is governed by the widths of certain basis functions that are used in generating the probability distribution in the input space. The only essential difference regarding the neighborhood then seems to be that in the SOM the width of the neighborhood kernel is decreasing in time according to certain, well-established schedules (cf., e.g., Kohonen, 1995c; Kohonen et al., 1996a) whereas in GTM the basis functions remain fixed.

One possible way of defining topology preservation is to require that the mapping from the lower-dimensional output space to the input space is smooth and continuous. According to this definition at least a continuous version of GTM can be interpreted to be topology preserving. Since even smooth, continuous mappings can be very ``twisted'', however, a measure of the regularity of the mapping is also needed. Most of the attempts to measure the topology preservation of the SOMs have in fact aimed at measuring the goodness (or regularity) of the mapping; some approaches will be discussed in Section 7.3. Measures that help in choosing basis functions that provide suitably regular mappings would probably be useful in the case of the GTM, as well.

It is probably useful in practical applications that different mappings generated with the GTM algorithm can be compared by comparing their likelihoods. Likelihoods have a straightforward interpretation. Neither the objective function of the SOM, Equation 7, nor the goodness measure presented in Publication 7 can be interpreted in this manner. These methods enable, however, selecting the best mapping from among a set of candidate mappings, which is the main goal in most applications.

It remains to be seen how important the advantages of the GTM are in practical applications. In addition to the advantages there is one undisputed disadvantage, however: the computational complexity. Computation of the GTM requires almost twice the time of the SOM [Bishop et al., 1996b]. Moreover, this difference is probably enhanced if the speedup methods discussed in this section are introduced, since it may be difficult to apply similar speedups to GTM. Most of the speedup methods reduce the computational complexity of the best match search; there are, however, no best matching units in GTM since all units contribute directly in representing each input.

Next: Notes on statistical accuracy Up: Self-organizing maps Previous: A decomposition of the

Sami Kaski
Mon Mar 31 23:43:35 EET DST 1997