As explained in section 4.3, the SOM partitions the input space into Voronoi tesselations, which are disjoint areas in the input space. One may try to fit local models to the data belonging to one of these Voronoi tesselations. At the simplest form, these models would be linear. We could also fit a model to adjacent, neighboring Voronoi tesselations thus enlarging the area of interest. The locality of the data set is dependent on the division made by the SOM. Division is determined by the number of codebook vectors and the teaching process.
Figure 5.11: The training data projected on the two first eigenvectors
In the Figure 5.11 the training data used in prediction in the previous section was projected on the first two eigenvectors having the largest eigenvalues. Only about 22.6 % of the energy was concentrated around this linear subspace. The eigenvalues decreased rather slowly which indicated that there was a lot of energy in all the directions. This seems quite natural taking into account that there was no partitioning of the input space and that the phenomenon at hand is relatively complex. Also, moderate measurement noise may have part in this.
An effort was made to build local linear models of the data. The training data was divided into Voronoi tesselations based on the best-matching unit of input samples. The covariance matrix and the sample mean were calculated based on the data belonging to the Voronoi tesselations under interest. By investigating the eigenvalues in descending order, it was noted that there was no rapid decrease indicating that the data would reside in a linear subspace. Despite partitioning, the eigenvalues decreased slowly indicating that there was energy in essentially all the directions of the original coordinate system.
Scale selection seems to be the key problem in finding proper partitionings for the input space.
It may be argued that the failure to describe the data set in terms of local subspaces was due to small data set size after partitioning, moderate noise in the measurements and improper partitioning of the input space. Further work has to be done in order to develop better solutions for this problem.