Learning to Understand
General Aspects of Using Self-Organizing Maps
in Natural Language Processing

Timo Honkela
Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200, FIN-02015 HUT, Finland
e-mail: Timo.Honkela@hut.fi
WWW: http://nucleus.hut.fi/~tho/

Proceedings of the CASYS'97, Computing Anticipatory Systems, Liège, Belgium, August, 1997, in press.

Selected excerpts

Abstract

The Self-Organizing Map (SOM) is an artificial neural network model based on unsupervised learning. In this paper, the use of the SOM in natural language processing is considered. The main emphasis is on natural features of natural language including contextuality of interpretation, and the communicative and social aspects of natural language learning and usage. The SOM is introduced as a general method for the analysis and visualization of complex, multidimensional input data. The approach of how to process natural language input is presented. Some epistemological underpinnings are outlined, including the creation of emergent and implicit categories by SOM, intersubjectivity and relativity of interpretation, and the relation between discrete symbols and continuous variables. Finally, the use of SOM as a component in an anticipatory system is presented, and the relation between anticipation and self-organization is discussed.

1. Introduction

Traditionally the formal study of language has centered around structural and static aspects. The automatic analysis and generation of syntactic structures has mainly been based on explicit, hand-written, and symbolic representations. In semantics the main focus has been on propositional structures, quantifiers, connectives, and other phenomena that match well the apparatus of predicate logic. This paper aims at widening the scope to include many more natural features of natural language including contextuality of interpretation, and the communicative and social aspects of natural language learning and usage. The principles of formalizing specific aspects of these phenomena are considered in this paper including the following:

In traditional study of language fixed categorizations are normally used. Lexical items, words and phrases, are positioned into categories such as verbs and nouns, and these categories are used in abstract rules, e.g., of the type "S -> NP VP", i.e., a sentence consists of a nominal phrase and a verb phrase. It may seem that the abstract rules are precise, but when they are applied, discrepancies exist between the rules and the actual use of language. A rule may be incorrect in various ways. For instance, a rule may be overtly general and should be refined. Refining the rule may be based on adding extra restrictions on its use, or creating more fine grained categories that divide the feature space into smaller areas. When this refinement process is continued into the extreme it may appear that each word has a category of its own. At least, it seems that a natural grammar has a fractal structure.
The use and interpretation of language is adaptive and context-sensitive. One can, of course, find the most usual patterns and make definitions based on them, but in the actual discussions and writings, words are often used creatively based on the particular situation. The well-known ambiguity "problem" highlights the context-sensitivity: there may be multiple interpretations for a word or a phrase but in the context the desired interpretation can be understood. Most often human listeners or readers do not even notice the potential alternative readings of distinct words. The preceding text and the overall context supports an anticipatory process that blocks effectively incorrect interpretations.
The context-sensitivity of interpretation is also relevant when one considers the more fine-grained structure of semantic and pragmatic level. The traditional, logic-based ontology of natural language interpretation is based on the idea that the world consists of distinct objects, their properties, and the relationships between the objects. Such a view neglects the fact that the propositional level of sentences does not have a simple one-to-one counterpart in the reality. The reality is highly complex, apparently high-dimensional in the perceptional level, changing, consisting of non-linear and continuous processes. Thus, studying the epistemological level having basically "names" and "objects" as their referents, may be considered to be far too simplistic. One should, for instance, take into account the relation between discrete symbols and the continuous spaces which the symbols refer to. - Human understanding of natural language is based on the long individual experience. Inevitably the differences in personal histories cause differences in the way humans interpret natural language expressions. In the light of the previous discussion, it should be clear how approaching this kind of phenomenon is difficult using the apparatus of symbolic logic without considering more refined mathematical tools of algebra. The subjectivity of interpretation is apparent when thoughtfully. The communication is enabled by the intersubjectivity based on the learning process in which the interpretations are adapted to match well enough so that meaningful exchange of thoughts becomes possible. Possibility of fine-grained differences in interpretation become understandable when continuous variables and spaces are considered as the counterparts for the disctinct symbols that are used in communication. For instance, if person A has a prototypical idea of a specific color having the value [0.234 0.004 0.678] in a color coding scheme, and for the person B the corresponding vector is [0.232 0.002 0.677], it is clear that the communication based on the symbol is successful in spite of the small difference, error, in the interpretation. Actually, it may be still more fruitful to consider a reference relation as a distribution rather than a relation between a symbol and a numerical value or vector. Such an approach enriches strongly the possibility to study fine-grained phenomena of natural language interpretation as opposed to the model theoretical approach.

In the following, Kohonen's Self-Organizing Maps (SOMs) (Kohonen, 1982, 1995) are introduced. The SOMs may provide a sound basis for modeling the general underlying principles of natural language learning and interpretation. The motivation for such a claim is presented in the rest of the paper.

[...]

4. Epistemological Considerations

In the following, the problem areas presented in the introduction and the methodological tools provided by the self-organizing maps and the underlying principles, are tied together. 4.1. Emergent Categories Conceptually interrelated words tend to fall into the same or neighboring node in the word category map (see, e.g., Kaski et al., 1996; Kohonen et al., 1996). The overall organization of a word category map reflects the syntactic categorization of the words. In the study by Honkela, Pulkki, and Kohonen (1995) the input for the map was the English translation of Grimm fairy tales. In the resulting map, in which 150 most common words of the tales were included, the verbs formed an area of their own in the top of the map whereas the nouns could be found in the opposite corner. The modal verbs were in one area. Semantically oriented ordering could also be found: for instance, the inanimate and animate nouns formed separate clusters.

An important consideration is that in the experiments the input for the SOM did not contain any predetermined classifications. The results indicate that the text input as such, with the statistical properties of the contextual relations, is sufficient for automatical creation of meaningful implicit categories. The categories emerge during the learning process. The symbol grounding task would, of course, be more realistic and complete if it is were possible to provide also other modalities such as pictorial images as a part of the context information.

4.2. Intersubjectivity and Relativity: Relation Between Discrete and Continuous

Subjectivity is inherent in human natural language interpretation. The nature and the level of the subjectivity has been subject to several debates. For instance, the Chomskian tradition of linguistics as well as the philosophy of language based on predicate logic seem clearly to undermine the subjective component of language processing. In them, the relation between the "names" of the language and the "objects" or "entities" may be taken as granted and to be unproblematic.

Consider now that we are about to denote an interval of a single continuous parameter using a limited number of symbols. These symbols are then used in the communication between two subjects (human or artificial). In a trivial case two subjects would have same denotations for the symbols, i.e. the limits of the intervals corresponding to each symbol would be identical. If the "experience" of the subjects is acquired from differing sources, the conceptualization may very well differ.

One may then ask how to deal with this kind of discrepancies. A propositional level is not sufficient. The key idea is to provide the means for a system to associate continuous-valued parameter spaces to sets of symbols, and furthermore, to "be aware" of the differences in this association and to learn those differences explicitly. These kinds of abilities are especially required by highly autonomous systems that need to communicate using an open set of symbols or constructs of a natural language. This kind of association of set of symbols and a set of continuous parameters is a natural extension or modification of the word category maps (see Honkela, 1993; Honkela and Vepsäläinen, 1991). An augmented input consists of three main parts: the encoded symbol, the context which is the parameter vector in this case, and identification of the utterer or source of the symbol being used. The map nodes associate symbols with the continuous parameters. One node corresponds to an area in the multidimensional space, i.e., a Voronoi tessellation determined by the codebook vector associated with the map node and its neighboring nodes. The relation is one-to-many: one symbol is associated with infinitive number of points.

In this kind of mapping, the error (cf. Rosen, 1985), or a kind of relativity is a necessity in communication. One can define the exact reference of a symbol in a continuous space only to a limit that is restricted by several issues, for instance, the limited time available for communication in which the level of intersubjectivity is raised. A common source of context is often not available either. Von Foerster (1972b) has outlined the very basic epistemological questions that are closely related to the topics of the present discussion. He states, among other things, that by keeping track of the computational pathways for establishing equivalence, "objects" and "events" emerge as consequences of branches of computation which are identified as the processes of abstraction and memorization. In the realm of symbolic logic the invariance and change are paradoxical: "the distinct being the same", and "the same being distinct". In a model that includes both the symbolic description as well as the continuous counterpart, there is no paradox, and the relationship may be computed, e.g., by the self-organizing map.

The previously presented framework also provides a means to consider the relationship between language and thoughts. In the case of colors, one may hypothesize that the perceptual input in the human experience is overwhelming when compared with the symbolic descriptions. Thus, the "color map" is based on the physical properties of the input. On the other hand, abstract concepts are based on the cultural "agreements" and they are communicated symbolically so that the relation to external, physical counterparts is less straightforward. A natural result would be that such concepts are much more prone to subjective differences based on the cultural environment. Even if the original perceptual input is available but it is constantly associated with a systematic classifying symbolic input, the result deviates strongly compared with the case in which the latter information is not available. Von Foerster (1972a) has described the phenomenon and its consequences in the following way: "We seem to be brought up in a world seen through descriptions by others rather than through our own perceptions. This has the consequence that instead of using language as a tool with which to express thoughts and experience, we accept language as a tool that determines our thoughts and experience." In linguistics this kind of idea is referred to as the Sapir-Whorf hypothesis.

4.3. Anticipatory Systems

Rosen (1985) has described anticipatory behavior to be one in which a change of state in the present occurs as a function of some predicted future state. In other words, an anticipatory system contains a predictive model of itself and/or its environment, which allows it to change state at an instant in accord with the model's predictions pertaining to a later instant.

Music involves the expectation and anticipation of situations on the one hand, and confirmation or disconfirmation of them on the other. Kaipainen (1994) has studied the use of SOMs in modeling musical perception. In addition to the basic SOM, Kaipainen uses a a list of lateral connections that record the transition probabilities from one map node to another. The model is based on the specific use of the SOM in which the time dynamics of a process are characterized by the trajectory path on the map. This aspect has been important already in the first application area of the SOMs, namely speech recognition, and more recently in process monitoring. The model of musical perception was tested in three modes called "Gibsonian", "autistic", and "Neisserian". The Gibsonian and autistic are the two extremes regarding the use of anticipation recorded in the trajectory memory: the first model was designed so that it did not use the trajectory information at all. The result was that continuity from one variation to another could not be maintained. On the other extreme, the autistic model was parameterized so that it developed a deterministically strong schematic drive. It began to use its internal representational states, eventually becoming ignorant of the input flow of musical patterns. The intermediate model, denoted as Neisserian, was the one that performed best in musical terms. It was open to to the input having at the same time an internal schematic drive, anticipation, which intentionally actualized musical situations rather than just recognizing them as given.

When the relationship between Rosen's formulations and Kohonen's self-organizing maps is considered the following quote may be of interest (Rosen, 1985): "Briefly, we believe that one of the primary functions of the mind is precisely to organize percepts. That is, the mind is not merely a passive receiver of perceptual images, but rather takes an active role in processing them and ultimately in responding to them through effector mechanism. The organization of percepts means precisely the establishment of relations between them. But we then must admit such relations reflect the properties of the active mind as much as they do the percepts which the mind organizes." It seems that the SOM concretizes this idea. In general, Rosen's point of view may be characterized as physical and biological whereas Kohonen's main results are related to computational, epistemological, neurophysiological, and cognitive aspects. Many basic issues are interrelated, though, including those of error, order and disorder, similarity, and encoding.

[...]

References

Elman Jeffrey (1991). Finding Structures in Time. Cognitive Science, 16, pp. 96-132.

von Foerster Heinz (1972a). Perception of the Future and the Future of Perception. Instructional Science 1, 1, R.W. Smith, and G.F. Brieske (eds.), Elsevier/North-Holland, New York/Amsterdam, pp. 31-43. (Also appeared in von Foerster, H.: Observing Systems, Intersystems Publications, Seaside, CA, 1981, pp. 189-204.)

von Foerster Heinz (1972b). Notes on an Epistemology for Living Things. BCL Report No. 9.3, Biological Computer Laboratory, Department of Electrical Engineering, University of Illinois, Urbana, 22 p. (Also appeared in von Foerster, H.: Observing Systems, Intersystems Publications, Seaside, CA, 1981, pp. 258-271.)

Honkela Timo and Vepsäläinen Ari M. (1991). Interpreting Imprecise Expressions: Experiments with Kohonen's Self-Organizing Maps and Associative Memory. Artificial Neural Networks, T. Kohonen and K. Mäkisara (eds.), vol. I, 897-902.

Honkela Timo (1993). Neural Nets that Discuss: A General Model of Communication Based on Self-Organizing Maps. Proc. ICANN'93, Int. Conf. on Artificial Neural Networks, S. Gielen and B. Kappen, Springer, London, 408-411.

Honkela Timo, Pulkki Ville, and Kohonen Teuvo (1995). Contextual relations of words in Grimm tales analyzed by self-organizing map. In F. Fogelman-Soulie and P. Gallinari (eds.) ICANN-95, Proceedings of International Conference on Artificial Neural Networks, vol. 2, pp. 3-7. EC2 et Cie, Paris.

Honkela Timo, Kaski Samuel, Lagus Krista, and Kohonen Teuvo. Newsgroup Exploration with WEBSOM Method and Browsing Interface. Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, January, 1996.

Honkela Timo (1997). Self-Organizing Maps of Words for Natural Language Processing Applications. Proceedings of Soft Computing '97, in print, September 17-19, 1997, 7 p.

Honkela Timo (1997). Emerging categories and adaptive prototypes: Self-organizing maps for cognitive linguistics. Extended abstract, accepted to be presented in the International Cognitive Linguistics Conference, Amsterdam, July 14-19, 1997.

Kaipainen Mauri (1994). Dynamics of Musical Knowledge Ecology - Knowing-What and Knowing-How in the World of Sounds. PhD thesis, University of Helsinki, Helsinki, Finland, Acta Musicologica Fennica 19.

Kaski Samuel (1997). Data Exploration Using Self-Organizing Maps. Dr.Tech thesis. Helsinki University of Technology, Espoo, Finland, Acta Polytechnica Scandinavica, no. 82.

Kaski Samuel, Honkela Timo, Lagus Krista, and Kohonen Teuvo (1996). Creating an order in digital libraries with self-organizing maps. Proceedings of WCNN'96, World Congress on Neural Networks, Lawrence Erlbaum and INNS Press, Mahwah, NJ, pp. 814-817.

Kohonen Teuvo (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, pp. 59-69.

Kohonen Teuvo (1995). Self-Organizing Maps. Springer-Verlag.

Kohonen Teuvo, Kaski Samuel, Lagus Krista, and Honkela Timo (1996). Very large two-level SOM for the browsing of newsgroups. Proceedings of ICANN'96, International Conference on Artificial Neural Networks.

Lagus Krista, Honkela Timo, Kaski Samuel, and Kohonen Teuvo (1996). Self-organizing maps of document collections: a new approach to interactive exploration. E. Simoudis, J. Han, and U. Fayyad (eds.), Proceedings of the Second International Conference on Knowledge Discovery & Data Mining, AAAI Press, Menlo Park, CA, pp. 238-243.

Miikkulainen Risto (1993). Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. MIT Press, Cambridge, MA.

Ritter Helge and Kohonen Teuvo (1989). Self-organizing semantic maps. Biological Cybernetics, vol. 61, no. 4, pp. 241-254.

Rosen Robert (1985). Anticipatory Systems. Pergamon Press.

Scholtes Jan C. (1993). Neural Networks in Natural Language Processing and Information Retrieval. PhD thesis, University of Amsterdam, Amsterdam.