Sami Virpioja

D.Sc. (Tech.), Researcher

Room T-A316 in Computer Science Building,
Konemiehentie 2, Otaniemi campus area, Espoo
Postal Address:
Aalto University School of Science,
Department of Information and Computer Science,
P.O. Box 15400, FI-00076 Aalto, Finland
+358 50 4301966

About me

I am interested on how methods of machine learning can be used to model complex phenomena such as language. Due to the sparsity of language data, it is relevant to find structures that can be used to represent the data more efficiently. For an example, see the page of the Morpho project and demonstration of the Morfessor algorithm. My research topics include also practical applications of statistical language modeling, especially speech recognition and machine translation.

I participate in the following research groups at Aalto University:


See the complete list of publications or the selected papers below. Click on a publication title to check availability and bibtex entry.

Doctoral thesis

Sami Virpioja (2012).
Learning Constructions of Natural Language: Statistical Models and Evaluations. Aalto University, Doctoral dissertations 158/2012.

Journal articles

Sami Virpioja, Mari-Sanna Paukkeri, Abhishek Tripathi, Tiina Lindh-Knuutila, and Krista Lagus (2012).
Evaluating Vector Space Models with Canonical Correlation Analysis. Natural Language Engineering, Volume 18, Issue 3, 2012, pp. 399-436.
Sami Virpioja, Ville T. Turunen, Sebastian Spiegler, Oskar Kohonen, and Mikko Kurimo (2011).
Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology. Traitement Automatique des Langues, Volume 52, Issue 2, 2011, pp. 45-90.
Vesa Siivola, Teemu Hirsimäki and Sami Virpioja (2007).
On Growing and Pruning Kneser-Ney Smoothed N-Gram Models. IEEE Transactions on Audio, Speech and Language Processing, Volume 15, Issue 5, July 2007, pp. 1617-1624.
Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Sami Virpioja and Janne Pylkkönen (2006).
Unlimited Vocabulary Speech Recognition with Morph Language Models Applied to Finnish. Computer Speech and Language, Volume 20, Issue 4, October 2006, pp. 515-541.

Recent conference and workshop papers

Teemu Ruokolainen, Oskar Kohonen, Sami Virpioja, and Mikko Kurimo (2013).
Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 29-37, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
Sami Virpioja, Minna Lehtonen, Annika Hultén, Riitta Salmelin, and Krista Lagus (2011).
Predicting Reaction Times in Word Recognition by Unsupervised Learning of Morphology. In Artificial Neural Networks and Machine Learning --- ICANN 2011, volume 6791 of Lecture Notes in Computer Science, pages 275-282. Springer Berlin / Heidelberg, June 2011.
Oskar Kohonen, Sami Virpioja, and Krista Lagus (2010).
Semi-supervised learning of concatenative morphology. In Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pages 78-86, Uppsala, Sweden, July 2010. Association for Computational Linguistics.
Mikko Kurimo, Sami Virpioja, Ville Turunen, and Krista Lagus (2010).
Morpho challenge 2005-2010: Evaluations and results. In Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pages 87-95, Uppsala, Sweden, July 2010. Association for Computational Linguistics.
Sami Virpioja, Oskar Kohonen, and Krista Lagus (2010).
Unsupervised morpheme analysis with Allomorfessor. In Multilingual Information Access Evaluation I. Text Retrieval Experiments: 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers, volume 6241 of Lecture Notes in Computer Science, pages 578-597. Springer.
Adrià de Gispert, Sami Virpioja, Mikko Kurimo and William Byrne (2009).
Minimum Bayes Risk Combination of Translation Hypotheses from Alternative Morphological Decompositions. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 73-76, Boulder, CO, USA, June 2009. Association for Computational Linguistics.