Pekka Marttinen

Assistant Professor (Tenure track) in Machine Learning
M.Sc. in Applied Mathematics (University of Helsinki, 2004)
Ph.D. in Statistics (University of Helsinki, 2008)
Title of docent in Information and Computer Science (Aalto University, 2015)

Academy of Finland Research Fellow (started in September 2015)
Postal Address:
Helsinki Institute for Information Technology HIIT, Department of Computer Science
Aalto University
P.O.Box 15400
Street Address:
Room A317, Konemiehentie 2, Espoo, Finland
pekka.marttinen (at)

Research Group: Machine Learning for Health (Aalto-ML4H)

Research Interests

Articles in Journals and Proceedings

  1. Kumar, Y., Salo, H., Nieminen, T., Vepsäläinen, K., Kulathinal, S., and Marttinen, P. (2019). Predicting utilization of healthcare services from individual disease trajectories using RNNs with multi-headed attention. Proceedings of Machine Learning Research: Machine Learning for Health (ML4H) at NeurIPS 2019, to appear.

  2. Gillberg, J., Marttinen, P., Mamitsuka, H., and Kaski, S. (2019). Modelling GxE with historical weather information improves genomic prediction in new environments. Bioinformatics, 35(20):4045-4052. Available online

  3. Järvenpää, M., Abdul Sater, M.R., Lagoudas, G.K., Blainey, P.C., Miller, L.G., McKinnell, J.A., Huang, S.S., Grad, Y.H.*, and Marttinen, P.* (2019). A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation. PLoS Computational Biology, 15(4):e1006534. (*equal contribution) Available online

  4. Gladstone, R.A., Lo, S.W., Lees, J.A., Croucher, N.J., van Tonder, A.J., Corander, J., Page, A.J., Marttinen, P., Bentley, L.J., Ochoa, T.J., Ho, P.L., du Plessis, M., Cornick, J.E., Kwambana-Adams, B., Benisty, R., Nzenze, S.A., Madhi, S.A., Hawkins, P.A., Everett, D.B., Antonio, M., Dagan, R., Klugman, K.P., von Gottberg, A., McGee, L., Breiman, R.F., Bentley, S.D., and The Global Pneumococcal Sequencing Consortium (2019). International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine, 43:338-346. Available online

  5. Järvenpää, M., Gutmann, M.U., Pleska, A., Vehtari, A., and Marttinen, P. (2019). Efficient acquisition rules for model-based approximate Bayesian computation. Bayesian Analysis, 14(2):595-622. Available online

  6. Sundin, I.*, Peltola, T.*, Micallef, L., Afrabandpey, H., Soare, M., Majumder, M.M., Daee, P., He, C., Serim, B., Havulinna, A., Heckman, C., Jacucci, G., Marttinen, P., and Kaski, S. (2018). Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge. Bioinformatics, 34(13):i395-i403. (*equal contribution) Available online

  7. Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., Skytén, K., Järvenpää, M., Marttinen, P., Gutmann, M., Vehtari, A., Corander, J., and Kaski, S. (2018). ELFI: Engine for Likelihood Free Inference. Journal of Machine Learning Research, 19(16):1-7. Available online

  8. Sipola, A., Marttinen, P., and Corander, J. (2018). Bacmeta: simulator for genomic evolution in bacterial metapopulations. Bioinformatics, 1:3. Available online

  9. Järvenpää, M., Gutmann, M., Vehtari, A., and Marttinen, P. (2018). Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Annals of Applied Statistics, Accepted for publication. Preprint

  10. Micallef, L.*, Sundin, I.*, Marttinen, P.*, Ammad-ud-din. M., Peltola, T., Soare, M., Jacucci, G., and Kaski, S. (2017). Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets. Proceedings of the 22nd International Conference on Intelligent User Interfaces (IUI '17). (*equal contribution) Pre-print

  11. Marttinen, P. and Hanage, W.P. (2017). Speciation trajectories in recombining bacterial species. PLOS Computational Biology, 13(7):e1005640. Available online

  12. David, S., Sanchez-Buso, L., Harris, S.R., Marttinen, P., Rusniok, C., Buchrieser, C., Harrison, T.G., and Parkhill, J. (2017). Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila. PLOS Genetics, 13(6):e1006855. Available online

  13. Pirinen, M., Benner, C., Marttinen, P., Järvelin, M.-R., Rivas, M.A., and Ripatti, S. (2017). biMM: Efficient estimation of genetic variances and covariances for cohorts with high-dimensional phenotype measurements. Bioinformatics, 33(15):2405-2407. Available online

  14. Villa, P.M., Marttinen, P., Gillberg, L., Lokki, A.I., Majander, K., Taipale, P., Pesonen, A., Räikkönen, K., Hämäläinen, E., Kajantie, E., and Laivuori, H. (2017). Cluster Analysis to Estimate the Risk of Preeclampsia in the High-Risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) Study. PLoS ONE, 12(3): e0174399. Available online

  15. Mostowy, R., Croucher, N.J., Andam, C.P., Corander, J., Hanage, W.P., and Marttinen, P. (2017). Efficient inference of recent and ancestral recombination within bacterial populations. Molecular Biology and Evolution, 34(5):1167-1182. Available online

  16. Harms, K., Lunnan, A., Hülter, N., Mourier, T., Vinner, L., Andam, C.P., Marttinen, P., Fridholm, H., Hansen, A.J., Hanage, W.P., Nielsen, K.M., Willerslev, E., and Johnsen, P.J. (2016). Substitutions of short heterologous DNA segments of intra- or extragenomic origins produce clustered genomic polymorphisms. Proceedings of the National Academy of Sciences of the United States of America, 113(52):15066-15071. doi:10.1073/pnas.1615819114. Available online

  17. Lees, J.A., Vehkala, M., Välimäki, N., Harris, S.R., Chewapreecha, C., Croucher, N.J., Marttinen, P., Davies, M.R., Steer, A.C., Tong, S.Y.C., Honkela, A., Parkhill, J., Bentley, S.D., and Corander, J. (2016). Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature Communications, 7:12797, doi:10.1038/ncomms12797. Available online

  18. Gillberg, J., Marttinen, P., Pirinen, M., Kangas, A.-J., Soininen, P., Ali, M., Havulinna, A. S., Järvelin, M.-R., Ala-Korpela, M., and Kaski, S. (2016). Multiple output regression with latent noise. Journal of Machine Learning Research, 17:1-35. Available online

  19. Sieberts, S., Zhu, F., García-García, J., Stahl, E., Pratap, A., Pandey, G., Pappas, D., Aguilar, D., Anton, B., Bonet, J., Eksi, R., Fornés, O., Guney, E., Li, H., Marín, M., Panwar, B., Planas-Iglesias, J., Poglayen, D., Cui, J., Falcao, A., Suver, C., Hoff, B., Balagurusamy, V., Dillenberger, D., Chaibub Neto, E., Norman, T., Aittokallio, T., Ammad-ud-din, M., Azencott, C.-A., Bellón, V., Boeva, V., Bunte, K., Chheda, H., Cheng, L., Corander, J., Dumontier, M., Goldenberg, A., Gopalacharyulu, P., Hajiloo, M., Hidru, D., Jaiswal, A., Kaski, S., Khalfaoui, B., Khan, S., Kramer, E., Marttinen, P., Mezlini, A., Molparia, B., Pirinen, M., Saarela, J., Samwald, M., Stoven, V., Tang, H., Tang, J., Torkamani, A., Vert, J.P., Wang, B., Wang, T., Wennerberg, K., Wineinger, N., Xiao, G., Xie, Y., Yeung, R., Zhan, X., Zhao, C., Greenberg, J., Kremer, J., Michaud, K., Barton, A., Coenen, M., Mariette, X., Miceli, C., Shadick, N., Weinblatt, M., de Vries, N, Tak, P., Gerlag, D., Huizinga, T.W.J., Kurreeman, F., Allaart, C., Bridges, S., Criswell, L., Moreland, L., Klareskog, L., Saevarsdottir, S., Padyukov, L., Gregersen, P., Friend, S., Plenge, R., Stolovitzky, G., Oliva, B., Guan, Y., and Mangravite, L. (2016). Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. Nature Communications, 7:12460, doi:10.1038/ncomms12460 Available online

  20. Numminen, E., Gutmann, M., Shubin, M., Marttinen, P., Meric, G.,van Schaik, W., Coque, T., Baquero, F., Willems, R., Sheppard, S., Feil, E., Hanage, W.P., and Corander, J. (2016). The impact of host metapopulation structure on the population genetics of colonizing bacteria. Journal of Theoretical Biology, 396: 53-62. Pre-print

  21. Cichonska, A., Rousu, J., Marttinen, P., Kangas, A.J., Soininen, P., Lehtimäki, T., Raitakari, O.T., Järvelin, M.-R., Salomaa, V., Ala-Korpela, M., Ripatti, S. and Pirinen, M. (2016). metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics, 32(13):1981-1989, doi: 10.1093/bioinformatics, Available online

  22. Marttinen, P., Croucher, N.J., Gutmann, M.U., Corander, J. and Hanage, W.P. (2015). Recombination produces coherent bacterial species clusters in both core and accessory genomes. Microbial Genomics, 1, doi:10.1099/mgen.0.000038, Available online (Supplement)

  23. Chewapreecha, C., Marttinen, P., Croucher, N.J.,Salter, S.J., Harris, S.R., Mather, A.E.,Hanage, W.P., Goldblatt, D., Nosten, F.H., Turner, C., Turner, P., Bentley, S.D. and Parkhill, J. (2014). Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genetics, 10(8):e1004547. doi:10.1371/journal.pgen.1004547

  24. Sheppard, S.K., Cheng, L., Méric, G., de Haan, C.P.A., Llarena, A.-K., Marttinen, P., Vidal, A., Ridley, A., Clifton-Hadley, F., Connor, T.R., Strachan, N.J.C, Forbes, K., Colles, F.M., Jolley, K.A., Bentley, S.D., Maiden, M.C.J., Hänninen, M.-L., Parkhill, J., Hanage, W.P. and Corander, J. (2014). Cryptic ecology among host generalist Campylobacter jejuni in domestic animals. Molecular Ecology, 23(10):2442-51. doi: 10.1111/mec.12742

  25. Kashtan, N., Roggensack, S.E., Rodrigue, S., Thompson, J.W., Biller, S.J., Coe, A., Ding, H., Marttinen, P., Malmstrom, R.R., Stocker, R., Follows, M.J., Stepanauskas, R. and Chisholm, S.W. (2014). Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science, 344(6182): 416-420.

  26. Marttinen, P., Pirinen, M., Sarin, A.P., Gillberg, J., Kettunen, J., Surakka, I., Kangas, A.J., Soininen, P., O’Reilly, P.F., Kaakinen, M., Kähönen, M., Lehtimäki, T., Ala-Korpela, M., Raitakari, O.T., Salomaa, V., Järvelin, M.-R., Ripatti, S. and Kaski, S. (2014). Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression. Bioinformatics, 30(14):2026-34. doi: 10.1093/bioinformatics/btu140

  27. Chewapreecha, C., Harris, S.R., Croucher, N.J., Turner, C., Marttinen, P., Cheng, L., Pessia, A., Aanensen, D.M., Mather, A.E., Page, A.J., Salter, S.J., Harris, D., Nosten, F., Goldblatt, D., Corander, J., Parkhill, J., Turner, P. and Bentley, S.D. (2014). Dense genomic sampling identifies highways of pneumococcal recombination. Nature Genetics, 46: 305-309.

  28. Marttinen, M., Pajari, A.-M., Päivärinta, E., Storvik, M., Marttinen, P., Nurmi, T., Niku, M., Piironen, V. and Mutanen, M. (2014). Plant sterol feeding induces tumor formation and alters sterol metabolism in the intestine of ApcMin mice. Nutrition and Cancer: An International Journal, 66(2). doi: 10.1080/01635581.2014.865244

  29. Marttinen, P., Gillberg, J., Havulinna, A., Corander, J. and Kaski, S. (2013). Genome-wide association studies with high-dimensional phenotypes. Statistical Applications in Genetics and Molecular Biology, 12(4): 413-431.

  30. Castillo-Ramírez, S., Corander, J., Marttinen, P., Aldeljawi, M., Hanage, W.P., Westh, H., Boye, K.,Gulay, Z., Bentley, S.D., Parkhill, J., Holden M.T. and Feil, E.J. (2012). Phylogeographic variation in recombination rates within a global clone of Methicillin-Resistant Staphylococcus aureus (MRSA). Genome Biology, 13(12):R126. doi:10.1186/gb-2012-13-12-r126

  31. Peltola, T., Marttinen P., and Vehtari, A. (2012). Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis. PLoS ONE, 7(11): e49445. doi:10.1371/journal.pone.0049445

  32. Delezuch, W., Marttinen, P., Kokki, H., Heikkinen, M., Lintula, H., Vanamo, K., Pulkki, K. and Matinlauri, I. (2012). Serum and CSF soluble CD26 and CD30 concentrations in healthy pediatric surgical outpatients. Tissue antigens, doi: 10.1111/j.1399-0039.2012.01938.x.

  33. Peltola, T., Marttinen, P., Jula, A., Salomaa, V., Perola, M. and Vehtari, A. (2012). Bayesian variable selection in searching for additive and dominant effects in genome-wide data. PLoS ONE, 7(1): e29115. doi:10.1371/journal.pone.0029115.

  34. Marttinen, P., Hanage, W.P., Nicholas, J.C., Connor, T.C., Harris, S.R., Bentley, S.D. and Corander, J. (2012). Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Research, 40(1): e6. doi: 10.1093/nar/gkr928.

  35. Sirén, J., Marttinen, P. and Corander, J. (2010). Reconstructing population histories from single-nucleotide polymorphism data. Molecular Biology and Evolution, 28(1):673-683.

  36. Marttinen, P. and Corander, J. (2010). Efficient Bayesian approach for multilocus association mapping including gene-gene interactions. BMC Bioinformatics, 11:443.

  37. Törönen, P., Ojala, P.J., Marttinen, P. and Holm, L. (2009). Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function. BMC Bioinformatics, 10:307.

  38. Marttinen, P., Myllykangas, S. and Corander, J. (2009). Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics, 10:90.

  39. Marttinen, P. and Corander, J. (2009). Bayesian learning of graphical vector autoregressions with unequal lag-lengths. Machine Learning, 75:217-243.

  40. Marttinen, P., Tang, J., De Baets, B., Dawyndt, P. and Corander, J. (2009). Bayesian clustering of fuzzy feature vectors using a quasi-likelihood approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:74-85.

  41. Corander, J., Marttinen, P., Sirén, J. and Tang, J. (2008). Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics, 9:539.

  42. Marttinen, P., Baldwin, A., Hanage, W.P., Dowson, C., Mahenthiralingam, E. and Corander, J. (2008). Bayesian modeling of recombination events in bacterial populations. BMC Bioinformatics, 9:421.

  43. Marttinen, P., Corander, J., Törönen, P. and Holm, L. (2006). Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics, 22:2466-2474.

  44. Corander, J. and Marttinen, P. (2006). Bayesian identification of admixture events using multi-locus molecular markers. Molecular Ecology, 15:2833-2843.

  45. Corander, J., Marttinen, P. and Mäntyniemi, S. (2006). Bayesian identification of stock mixtures from molecular marker data. Fishery Bulletin, 104:550-558.

  46. Corander, J. and Marttinen, P. (2006). Bayesian model learning based on predictive entropy. Journal of Logic, Language and Information, 15:5-20.

  47. Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics, 20:2363-2369.

Extended Abstracts in Workshops

  1. Cui, T. et al. (2019). Learning pairwise global interactions using Bayesian Neural Networks. Bayesian Deep Learning, Workshop at NeurIPS 2019.

  2. Zhang, G. et al. (2019). Errors-in-variables modeling of personalized treatment-response trajectories. ML4H: Machine Learning for Health, Workshop at NeurIPS 2019.

  3. Järvenpää, M. et al. (2018). A Bayesian model of acquisition and clearance of bacterial colonization. ML4H: Machine Learning for Health, Workshop at NeurIPS 2018.

  4. Sundin, I. et al. (2017). Ask the doctor - Improving drug sensitivity predictions through active expert knowledge elicitation. ML4H: Machine Learning for Health, Workshop at NIPS 2017.

  5. Järvenpää, M. et al. (2017). Efficient acquisition rules for model-based approximate Bayesian computation. Advances in Approximate Bayesian Inference, Workshop at NIPS 2017.

  6. Järvenpää, M. et al. (2017). Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2017.

  7. Gillberg, J. et al. (2016). Multiple output regression with latent noise. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2016.

  8. Marttinen, P. et al. (2015). Assessing multivariate gene-metabolome associations with the Bayesian reduced rank regression. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2015.

  9. Cichonska, A. et al. (2014). Meta-analysis of genome-wide association studies with multivariate traits. International Workshop on Machine Learning and Systems Biology, MLSB14.

  10. Marttinen, P. et al. (2012). Genome-wide association studies with high-dimensional phenotypes. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2012.



I have authored/co-authored the following software: