next up previous contents
Next: Scaling Up: Data preprocessing Previous: Removing erroneous data

Data encoding

If the data is coded in a non-metric scale, i.e. the metric distance can not be used as a measure of similarity, the coding must be transformed. Groupings and class memberships are examples of this kind of coding. Having groups 1 to 10, we can not say that the group number 9 is more similar to the group number 10 than the group number 1. N groups can be divided into one-of-n coding using N components. For group N, the Nth component is 1, others are 0.

Measurement must be made quantifiable, because the Euclidean distance is commonly used as a measure of similarity. Coding must be in harmony with the similarity measure used. Symbolic data cannot be processed with the SOM as such, but can be transformed to a suitable form. See [34] for reference.



Jaakko Hollmen
Fri Mar 8 13:44:32 EET 1996