next up previous contents
Next: Recent developments Up: CASE STUDIES Previous: Statistical tables

Full-text document collections

A project that aims at constructing methods for exploring full-text document collections, the WEBSOM, started from Timo Honkela's suggestion of using the ``self-organizing semantic maps'' [Ritter and Kohonen, 1989] as a preprocessing stage for encoding documents. When the documents are organized, using this preprocessing stage, on a map in such a way that nearby locations contain similar documents, exploration of the collection is facilitated by the intuitive neighborhood relations. Structures in the collection can be visualized with the methods described in Section 6.4.2.

The basic method is described in Publication 3; experiments with very large maps and document collections are in Publication 4; and the browsing interface and exploration examples are in Publication 5. A partly supervised version of the method has also been constructed [Honkela et al., 1996]. The maps that have been presented in the publications are available for exploration in the Internet at the address

The advantages gained by using such a SOM-based feature extraction stage in WEBSOM are analyzed in more detail in Publication 6. It has turned out that the self-organizing semantic map can be used to form a computationally efficient approximation of a probabilistic model that takes into account contextual information in encoding the documents.

Sami Kaski
Mon Mar 31 23:43:35 EET DST 1997