Pekka Marttinen

Assistant Professor (Tenure track) in Machine Learning
M.Sc. in Applied Mathematics (University of Helsinki, 2004)
Ph.D. in Statistics (University of Helsinki, 2008)
Title of docent in Information and Computer Science (Aalto University, 2015)

Postal Address:
Helsinki Institute for Information Technology HIIT, Department of Computer Science
Aalto University
P.O.Box 15400
Street Address:
Room A317, Konemiehentie 2, Espoo, Finland
pekka.marttinen (at)

Research Group: Machine Learning for Health (Aalto-ML4H)

Research Interests

Articles in Journals and Proceedings

  1. Ashrafi, R., Ahola, Ai., Rosengård-Bärlund, M., Saarinen, T., Heinonen, S., Juuti, A., Marttinen, P., and Pietiläinen, K. (2021). Computational modelling of self-reported dietary carbohydrate intake on glucose concentrations in patients undergoing Roux-en-Y gastric bypass versus one-anastomosis gastric bypass. Annals of Medicine, accepted.

  2. Sun, W, Ji, S., Cambria, E., and Marttinen, P. (2021). Multitask recalibrated aggregation network for medical code prediction. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), accepted. Preprint

  3. Ji, S., Pan, S., and Marttinen, P. (2021). Medical Code Assignment with Gated Convolution and Note-Code Interaction. Findings of ACL, accepted. Preprint

  4. Järvenpää, M., Gutmann, M.U., Vehtari, A., and Marttinen, P. (2021). Parallel Gaussian process surrogate Bayesian inference with noisy likelihood evaluations. Bayesian Analysis, 16(1):147-178. Available online

  5. Ji, S., Pan, S., Cambria, E., Marttinen, P., and Yu, P.S. (2021). A Survey on Knowledge Graphs: Representation, Acquisition and Applications. IEEE Transactions on Neural Networks and Learning Systems. Preprint

  6. Ji, S., Cambria, E., and Marttinen, P. (2020). Dilated Convolutional Attention Network for Medical Code Assignment from Clinical Text. Proceedings of the 3rd Clinical Natural Language Processing Workshop at EMNLP 2020. Available online

  7. Järvenpää, M., Vehtari, A., and Marttinen, P. (2020). Batch simulations and uncertainty quantification in Gaussian process surrogate approximate Bayesian computation. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI 2020). Available online

  8. Zhang, G., Ashrafi, R.A., Juuti, A., Pietiläinen, K., and Marttinen, P. (2020). Errors-in-variables modeling of personalized treatment-response trajectories. IEEE Journal of Biomedical and Health Informatics, 25(1):201-208. Available online.

  9. Cui, T., Marttinen, P.*, and Kaski, S.* (2020). Learning Global Pairwise Interactions with Bayesian Neural Networks. Proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020). (*equal contribution) Available online

  10. Kumar, Y., Salo, H., Nieminen, T., Vepsäläinen, K., Kulathinal, S., and Marttinen, P. (2020). Predicting utilization of healthcare services from individual disease trajectories using RNNs with multi-headed attention. Proceedings of Machine Learning Research: Machine Learning for Health (ML4H) at NeurIPS 2019, 116:93-111. Available online

  11. Arredondo-Alonso, S., Top, J., McNally, A., Puranen, S., Pesonen, M., Pensar, J., Marttinen, P., Braat, J., Rogers, M., Van Schaik, W., Kaski, S., Willems, R., Corander, J., and Schürch, A. (2020). Plasmids shaped the recent emergence of the major nosocomial pathogen Enterococcus faecium. mBio, 11(1). Available online

  12. Gillberg, J., Marttinen, P., Mamitsuka, H., and Kaski, S. (2019). Modelling GxE with historical weather information improves genomic prediction in new environments. Bioinformatics, 35(20):4045-4052. Available online

  13. Järvenpää, M., Abdul Sater, M.R., Lagoudas, G.K., Blainey, P.C., Miller, L.G., McKinnell, J.A., Huang, S.S., Grad, Y.H.*, and Marttinen, P.* (2019). A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation. PLoS Computational Biology, 15(4):e1006534. (*equal contribution) Available online

  14. Gladstone, R.A., Lo, S.W., Lees, J.A., Croucher, N.J., van Tonder, A.J., Corander, J., Page, A.J., Marttinen, P., Bentley, L.J., Ochoa, T.J., Ho, P.L., du Plessis, M., Cornick, J.E., Kwambana-Adams, B., Benisty, R., Nzenze, S.A., Madhi, S.A., Hawkins, P.A., Everett, D.B., Antonio, M., Dagan, R., Klugman, K.P., von Gottberg, A., McGee, L., Breiman, R.F., Bentley, S.D., and The Global Pneumococcal Sequencing Consortium (2019). International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine, 43:338-346. Available online

  15. Järvenpää, M., Gutmann, M.U., Pleska, A., Vehtari, A., and Marttinen, P. (2019). Efficient acquisition rules for model-based approximate Bayesian computation. Bayesian Analysis, 14(2):595-622. Available online

  16. Sundin, I.*, Peltola, T.*, Micallef, L., Afrabandpey, H., Soare, M., Majumder, M.M., Daee, P., He, C., Serim, B., Havulinna, A., Heckman, C., Jacucci, G., Marttinen, P., and Kaski, S. (2018). Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge. Bioinformatics, 34(13):i395-i403. (*equal contribution) Available online

  17. Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., Skytén, K., Järvenpää, M., Marttinen, P., Gutmann, M., Vehtari, A., Corander, J., and Kaski, S. (2018). ELFI: Engine for Likelihood Free Inference. Journal of Machine Learning Research, 19(16):1-7. Available online

  18. Sipola, A., Marttinen, P., and Corander, J. (2018). Bacmeta: simulator for genomic evolution in bacterial metapopulations. Bioinformatics, 1:3. Available online

  19. Järvenpää, M., Gutmann, M., Vehtari, A., and Marttinen, P. (2018). Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Annals of Applied Statistics, 12(4):2228-2251. Available online

  20. Micallef, L.*, Sundin, I.*, Marttinen, P.*, Ammad-ud-din. M., Peltola, T., Soare, M., Jacucci, G., and Kaski, S. (2017). Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets. Proceedings of the 22nd International Conference on Intelligent User Interfaces (IUI '17). (*equal contribution) Pre-print

  21. Marttinen, P. and Hanage, W.P. (2017). Speciation trajectories in recombining bacterial species. PLOS Computational Biology, 13(7):e1005640. Available online

  22. David, S., Sanchez-Buso, L., Harris, S.R., Marttinen, P., Rusniok, C., Buchrieser, C., Harrison, T.G., and Parkhill, J. (2017). Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila. PLOS Genetics, 13(6):e1006855. Available online

  23. Pirinen, M., Benner, C., Marttinen, P., Järvelin, M.-R., Rivas, M.A., and Ripatti, S. (2017). biMM: Efficient estimation of genetic variances and covariances for cohorts with high-dimensional phenotype measurements. Bioinformatics, 33(15):2405-2407. Available online

  24. Villa, P.M., Marttinen, P., Gillberg, L., Lokki, A.I., Majander, K., Taipale, P., Pesonen, A., Räikkönen, K., Hämäläinen, E., Kajantie, E., and Laivuori, H. (2017). Cluster Analysis to Estimate the Risk of Preeclampsia in the High-Risk Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) Study. PLoS ONE, 12(3): e0174399. Available online

  25. Mostowy, R., Croucher, N.J., Andam, C.P., Corander, J., Hanage, W.P., and Marttinen, P. (2017). Efficient inference of recent and ancestral recombination within bacterial populations. Molecular Biology and Evolution, 34(5):1167-1182. Available online

  26. Harms, K., Lunnan, A., Hülter, N., Mourier, T., Vinner, L., Andam, C.P., Marttinen, P., Fridholm, H., Hansen, A.J., Hanage, W.P., Nielsen, K.M., Willerslev, E., and Johnsen, P.J. (2016). Substitutions of short heterologous DNA segments of intra- or extragenomic origins produce clustered genomic polymorphisms. Proceedings of the National Academy of Sciences of the United States of America, 113(52):15066-15071. doi:10.1073/pnas.1615819114. Available online

  27. Lees, J.A., Vehkala, M., Välimäki, N., Harris, S.R., Chewapreecha, C., Croucher, N.J., Marttinen, P., Davies, M.R., Steer, A.C., Tong, S.Y.C., Honkela, A., Parkhill, J., Bentley, S.D., and Corander, J. (2016). Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature Communications, 7:12797, doi:10.1038/ncomms12797. Available online

  28. Gillberg, J., Marttinen, P., Pirinen, M., Kangas, A.-J., Soininen, P., Ali, M., Havulinna, A. S., Järvelin, M.-R., Ala-Korpela, M., and Kaski, S. (2016). Multiple output regression with latent noise. Journal of Machine Learning Research, 17:1-35. Available online

  29. Sieberts, S., Zhu, F., García-García, J., Stahl, E., Pratap, A., Pandey, G., Pappas, D., Aguilar, D., Anton, B., Bonet, J., Eksi, R., Fornés, O., Guney, E., Li, H., Marín, M., Panwar, B., Planas-Iglesias, J., Poglayen, D., Cui, J., Falcao, A., Suver, C., Hoff, B., Balagurusamy, V., Dillenberger, D., Chaibub Neto, E., Norman, T., Aittokallio, T., Ammad-ud-din, M., Azencott, C.-A., Bellón, V., Boeva, V., Bunte, K., Chheda, H., Cheng, L., Corander, J., Dumontier, M., Goldenberg, A., Gopalacharyulu, P., Hajiloo, M., Hidru, D., Jaiswal, A., Kaski, S., Khalfaoui, B., Khan, S., Kramer, E., Marttinen, P., Mezlini, A., Molparia, B., Pirinen, M., Saarela, J., Samwald, M., Stoven, V., Tang, H., Tang, J., Torkamani, A., Vert, J.P., Wang, B., Wang, T., Wennerberg, K., Wineinger, N., Xiao, G., Xie, Y., Yeung, R., Zhan, X., Zhao, C., Greenberg, J., Kremer, J., Michaud, K., Barton, A., Coenen, M., Mariette, X., Miceli, C., Shadick, N., Weinblatt, M., de Vries, N, Tak, P., Gerlag, D., Huizinga, T.W.J., Kurreeman, F., Allaart, C., Bridges, S., Criswell, L., Moreland, L., Klareskog, L., Saevarsdottir, S., Padyukov, L., Gregersen, P., Friend, S., Plenge, R., Stolovitzky, G., Oliva, B., Guan, Y., and Mangravite, L. (2016). Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. Nature Communications, 7:12460, doi:10.1038/ncomms12460 Available online

  30. Numminen, E., Gutmann, M., Shubin, M., Marttinen, P., Meric, G.,van Schaik, W., Coque, T., Baquero, F., Willems, R., Sheppard, S., Feil, E., Hanage, W.P., and Corander, J. (2016). The impact of host metapopulation structure on the population genetics of colonizing bacteria. Journal of Theoretical Biology, 396: 53-62. Pre-print

  31. Cichonska, A., Rousu, J., Marttinen, P., Kangas, A.J., Soininen, P., Lehtimäki, T., Raitakari, O.T., Järvelin, M.-R., Salomaa, V., Ala-Korpela, M., Ripatti, S. and Pirinen, M. (2016). metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics, 32(13):1981-1989, doi: 10.1093/bioinformatics, Available online

  32. Marttinen, P., Croucher, N.J., Gutmann, M.U., Corander, J. and Hanage, W.P. (2015). Recombination produces coherent bacterial species clusters in both core and accessory genomes. Microbial Genomics, 1, doi:10.1099/mgen.0.000038, Available online (Supplement)

  33. Chewapreecha, C., Marttinen, P., Croucher, N.J.,Salter, S.J., Harris, S.R., Mather, A.E.,Hanage, W.P., Goldblatt, D., Nosten, F.H., Turner, C., Turner, P., Bentley, S.D. and Parkhill, J. (2014). Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genetics, 10(8):e1004547. doi:10.1371/journal.pgen.1004547

  34. Sheppard, S.K., Cheng, L., Méric, G., de Haan, C.P.A., Llarena, A.-K., Marttinen, P., Vidal, A., Ridley, A., Clifton-Hadley, F., Connor, T.R., Strachan, N.J.C, Forbes, K., Colles, F.M., Jolley, K.A., Bentley, S.D., Maiden, M.C.J., Hänninen, M.-L., Parkhill, J., Hanage, W.P. and Corander, J. (2014). Cryptic ecology among host generalist Campylobacter jejuni in domestic animals. Molecular Ecology, 23(10):2442-51. doi: 10.1111/mec.12742

  35. Kashtan, N., Roggensack, S.E., Rodrigue, S., Thompson, J.W., Biller, S.J., Coe, A., Ding, H., Marttinen, P., Malmstrom, R.R., Stocker, R., Follows, M.J., Stepanauskas, R. and Chisholm, S.W. (2014). Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science, 344(6182): 416-420.

  36. Marttinen, P., Pirinen, M., Sarin, A.P., Gillberg, J., Kettunen, J., Surakka, I., Kangas, A.J., Soininen, P., O’Reilly, P.F., Kaakinen, M., Kähönen, M., Lehtimäki, T., Ala-Korpela, M., Raitakari, O.T., Salomaa, V., Järvelin, M.-R., Ripatti, S. and Kaski, S. (2014). Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression. Bioinformatics, 30(14):2026-34. doi: 10.1093/bioinformatics/btu140

  37. Chewapreecha, C., Harris, S.R., Croucher, N.J., Turner, C., Marttinen, P., Cheng, L., Pessia, A., Aanensen, D.M., Mather, A.E., Page, A.J., Salter, S.J., Harris, D., Nosten, F., Goldblatt, D., Corander, J., Parkhill, J., Turner, P. and Bentley, S.D. (2014). Dense genomic sampling identifies highways of pneumococcal recombination. Nature Genetics, 46: 305-309.

  38. Marttinen, M., Pajari, A.-M., Päivärinta, E., Storvik, M., Marttinen, P., Nurmi, T., Niku, M., Piironen, V. and Mutanen, M. (2014). Plant sterol feeding induces tumor formation and alters sterol metabolism in the intestine of ApcMin mice. Nutrition and Cancer: An International Journal, 66(2). doi: 10.1080/01635581.2014.865244

  39. Marttinen, P., Gillberg, J., Havulinna, A., Corander, J. and Kaski, S. (2013). Genome-wide association studies with high-dimensional phenotypes. Statistical Applications in Genetics and Molecular Biology, 12(4): 413-431.

  40. Castillo-Ramírez, S., Corander, J., Marttinen, P., Aldeljawi, M., Hanage, W.P., Westh, H., Boye, K.,Gulay, Z., Bentley, S.D., Parkhill, J., Holden M.T. and Feil, E.J. (2012). Phylogeographic variation in recombination rates within a global clone of Methicillin-Resistant Staphylococcus aureus (MRSA). Genome Biology, 13(12):R126. doi:10.1186/gb-2012-13-12-r126

  41. Peltola, T., Marttinen P., and Vehtari, A. (2012). Finite Adaptation and Multistep Moves in the Metropolis-Hastings Algorithm for Variable Selection in Genome-Wide Association Analysis. PLoS ONE, 7(11): e49445. doi:10.1371/journal.pone.0049445

  42. Delezuch, W., Marttinen, P., Kokki, H., Heikkinen, M., Lintula, H., Vanamo, K., Pulkki, K. and Matinlauri, I. (2012). Serum and CSF soluble CD26 and CD30 concentrations in healthy pediatric surgical outpatients. Tissue antigens, doi: 10.1111/j.1399-0039.2012.01938.x.

  43. Peltola, T., Marttinen, P., Jula, A., Salomaa, V., Perola, M. and Vehtari, A. (2012). Bayesian variable selection in searching for additive and dominant effects in genome-wide data. PLoS ONE, 7(1): e29115. doi:10.1371/journal.pone.0029115.

  44. Marttinen, P., Hanage, W.P., Nicholas, J.C., Connor, T.C., Harris, S.R., Bentley, S.D. and Corander, J. (2012). Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Research, 40(1): e6. doi: 10.1093/nar/gkr928.

  45. Sirén, J., Marttinen, P. and Corander, J. (2010). Reconstructing population histories from single-nucleotide polymorphism data. Molecular Biology and Evolution, 28(1):673-683.

  46. Marttinen, P. and Corander, J. (2010). Efficient Bayesian approach for multilocus association mapping including gene-gene interactions. BMC Bioinformatics, 11:443.

  47. Törönen, P., Ojala, P.J., Marttinen, P. and Holm, L. (2009). Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function. BMC Bioinformatics, 10:307.

  48. Marttinen, P., Myllykangas, S. and Corander, J. (2009). Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics, 10:90.

  49. Marttinen, P. and Corander, J. (2009). Bayesian learning of graphical vector autoregressions with unequal lag-lengths. Machine Learning, 75:217-243.

  50. Marttinen, P., Tang, J., De Baets, B., Dawyndt, P. and Corander, J. (2009). Bayesian clustering of fuzzy feature vectors using a quasi-likelihood approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:74-85.

  51. Corander, J., Marttinen, P., Sirén, J. and Tang, J. (2008). Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics, 9:539.

  52. Marttinen, P., Baldwin, A., Hanage, W.P., Dowson, C., Mahenthiralingam, E. and Corander, J. (2008). Bayesian modeling of recombination events in bacterial populations. BMC Bioinformatics, 9:421.

  53. Marttinen, P., Corander, J., Törönen, P. and Holm, L. (2006). Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics, 22:2466-2474.

  54. Corander, J. and Marttinen, P. (2006). Bayesian identification of admixture events using multi-locus molecular markers. Molecular Ecology, 15:2833-2843.

  55. Corander, J., Marttinen, P. and Mäntyniemi, S. (2006). Bayesian identification of stock mixtures from molecular marker data. Fishery Bulletin, 104:550-558.

  56. Corander, J. and Marttinen, P. (2006). Bayesian model learning based on predictive entropy. Journal of Logic, Language and Information, 15:5-20.

  57. Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics, 20:2363-2369.

Extended Abstracts in Workshops

  1. Järvenpää, M. et al. (2019). Batch simulations and uncertainty quantification in Gaussian process surrogate-based approximate Bayesian computation. 2nd Symposium on Advances in Approximate Bayesian Inference.

  2. Cui, T. et al. (2019). Learning pairwise global interactions using Bayesian Neural Networks. Bayesian Deep Learning, Workshop at NeurIPS 2019.

  3. Zhang, G. et al. (2019). Errors-in-variables modeling of personalized treatment-response trajectories. ML4H: Machine Learning for Health, Workshop at NeurIPS 2019.

  4. Järvenpää, M. et al. (2018). A Bayesian model of acquisition and clearance of bacterial colonization. ML4H: Machine Learning for Health, Workshop at NeurIPS 2018.

  5. Sundin, I. et al. (2017). Ask the doctor - Improving drug sensitivity predictions through active expert knowledge elicitation. ML4H: Machine Learning for Health, Workshop at NIPS 2017.

  6. Järvenpää, M. et al. (2017). Efficient acquisition rules for model-based approximate Bayesian computation. Advances in Approximate Bayesian Inference, Workshop at NIPS 2017.

  7. Järvenpää, M. et al. (2017). Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2017.

  8. Gillberg, J. et al. (2016). Multiple output regression with latent noise. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2016.

  9. Marttinen, P. et al. (2015). Assessing multivariate gene-metabolome associations with the Bayesian reduced rank regression. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2015.

  10. Cichonska, A. et al. (2014). Meta-analysis of genome-wide association studies with multivariate traits. International Workshop on Machine Learning and Systems Biology, MLSB14.

  11. Marttinen, P. et al. (2012). Genome-wide association studies with high-dimensional phenotypes. Machine Learning in Computational Biology (MLCB), Workshop at NIPS 2012.



I have authored/co-authored the following software: