The data collection that was used in the experiments consisted of individual Finnish words, spoken by 59 different speakers. The data has been collected at the Laboratory of Computer and Information Science. A detailed description of the data and its collection process can be found in Vesa Siivola's Master's thesis .
Due to the complexity of the algorithms used, only a very small fraction of the whole collection was ever used. The time needed for learning increases linearly as the amount of data increases and with more data even a single experiment would have taken months to complete. The data sets used in the experiments were selected by randomly taking individual words from the whole collection. In a typical experiment, where the used data set consisted of only a few dozen words, this meant that practically every word in the set was spoken by a different person.