Kaski, S., Data exploration using self-organizing maps. Acta
Polytechnica Scandinavica, Mathematics, Computing and Management in
Engineering Series No. 82, Espoo 1997, 57 pp. Published by the
Finnish Academy of Technology. ISBN 952-5148-13-0. ISSN 1238-9803. UDC
The document is available in postscript and in gzipped postscript as well.
Keywords: Data mining, exploratory data analysis, multivariate analysis, neural networks, self-organizing map, SOM
Thesis for the degree of Doctor of Technology to be presented with
due permission for public examination and criticism in Auditorium F1
of the Helsinki University of Technology on the 21st of March, at 12
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The self-organizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays.
In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing high-dimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing full-text document collections.
Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robustness of the illustrations the maps produce. The same measures may also be used for comparing the knowledge that different maps represent.
Feature extraction must in general be tailored to the application, as is done in the case studies. There exists, however, an algorithm called the adaptive-subspace self-organizing map, recently developed by Kohonen, which may be of help. It extracts invariant features automatically from a data set. The algorithm is here characterized in terms of an objective function, and demonstrated to be able to identify input patterns subject to different transformations. Moreover, it could also aid in feature exploration: the kernels that the algorithm creates to achieve invariance can be illustrated on map displays similar to those that are used for illustrating the data sets.
© All rights reserved. No part of the publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the author.