I am an Academy of Finland Postdoctoral Fellow for 2016-2019 in Aalto University at the Department of Computer Science.

I am part of the research groups

o *Computational systems biology (CSB)*

o *Kernel Methods, Pattern Analysis and Computational Metabolomics (KEPACO)*

o *Probabilistic Machine learning (PML)*

Office: room B360, CS-building, Konemiehentie 2, FI-02150 Espoo

email: markus.o.heinonen@aalto.fi

mobile: +358 44 294 2600

Mailing address

Aalto University

Department of Information and Computer Science

PO Box 15400

FI-00076 Aalto

Finland

My research focuses on nonstationary, multi-task, large-scale or latent Gaussian process based models, on Bayesian modelling, on metabolomics models and on kernel-based machine learning models for enzymes.

Keywords: machine learning, Gaussian processes, nonstationary regression, synthetic biology, enzyme design, graph kernels, kernel methods, differential testing, nonparameteric ODEs, random fourier features

Emmi Jokinen, Markus Heinonen, Harri Lähdesmäki

**mGPfusion: Predicting protein stability changes upon single and multiple mutations with Gaussian processes and data fusion**

Bioinformatics, submitted, 2017

[ github ]

We combine experimental and simulated data to learn a Gaussian process based protein stability predictor. We propose a Bayesian data transformation that calibrates the simulated data against the experimental one. Our method requires less experimental measurements due to inclusion of simulated data.

Sami Remes, Markus Heinonen, Samuel Kaski

**A Mutually-Dependent Hadamard Kernel for Modelling Latent Variable Couplings**

ACML'17, to appear

[ arxiv | github ]

We introduce a new non-stationary kernel between inputs and signals, which allow non-stationary couplings between latent variables. The new kernel is based on Gibbs kernel and Generalised Wishart Process.

Sami Remes, Markus Heinonen, Samuel Kaski

**Non-Stationary Spectral Kernels**

NIPS'17, to appear

[ arxiv | github ]

We introduce non-stationary spectral kernels, which can learn covariances based on input-dependent frequencies (e.g. wavelets). We model the input-dependent frequencies as Gaussian process mixtures, and can learn signals with varying frequencies.

Romain Brault, Florence d'Alche-Buc, Markus Heinonen

**Random Fourier Features for operator-valued kernels**

ACML 2016, PMLR 63:110-125

[ abstract | PDF ]

We introduce random fourier features for vector-valued function learning, i.e. RFF's for operator-valued kernels.

Markus Heinonen, Henrik Mannerström, Juho Rousu, Samuel Kaski, Harri Lähdesmäki

** Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo**

AISTATS'16, JMLR 51:732-740, 2016

[ abstract | PDF | supplements | code ]

We model all kernel parameters and the noise as separete Gaussian processes which are smoothly input-dependent. HMC sampling reveals the full parameter function posteriors.

Tiina Pakula, Heli Nygren, Dorothee Barth, Markus Heinonen, Sandra Castillo, Merja Penttilä, Mikko Arvas

**Genome wide analysis of protein production load in Trichoderma reesei**

Biotechnology for Biofuels, 9:132, 2016

[ abstract ]

Transcriptomics and metabolic analysis of Trichoderma Reesei protein production.

Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Georges Tarlet, Marc Benderittter, Farida Zehraoui, Florence d'Alche-Buc

**Detecting time periods of differential gene expression using Gaussian processes: An application to endothelial cells exposed to radiotherapy dose fraction**

Bioinformatics, 31(5): 728-735, 2015

[ abstract |
nsgp R package ]

We propose a two-sample differential testing model on Gaussian processes. We introduce a new two-sample test that is continuous along time and results in differential confidences along time.

Markus Heinonen, Florence d'Alche-Buc

** Learning nonparametric differential equations with operator-valued kernels and gradient matching**

arXiv, 2014

[ abstract | PDF ]

Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Florence d'Alche-Buc

**Time-dependent gaussian process regression and significance analysis for sparse time-series**

In MLSB'13

We propose non-stationary kernels for Gaussian processes and a new Gaussian process optimization criteria suitable for sparse data. We propose new likelihood ratio tests for significance analysis using GP's.

Huibin Shen, Nicola Zamboni, Markus Heinonen, Juho Rousu

**Metabolite Identification trough Machine Learning -- Tackling CASMI Challenge using FingerID**

Metabolites, 3:484-505, 2013

[ abstract ]

Our experiences in the CASMI metabolite identification challenge.

Markus Heinonen, Huibin Shen, Nicola Zamboni and Juho Rousu

**Metabolite identification and fingerprint prediction via machine learning**

Bioinformatics, 28(18):2333-41, 2012

[ abstract | preprint PDF ]

First application of machine learning to identify metabolites based on MS/MS data. We use probability product kernel over mass spectral features to learn a mapping between mass spectrum and binary structural properties of the unknown metabolite. We show that the properties can be used to query the unknown structure from e.g. PubChem.

Markus Heinonen, Niko Välimäki, Veli Mäkinen and Juho Rousu

**Efficient path kernels for reaction function prediction**

In Proceedings of 3rd International Conference on Bioinformatics Models,
Methods and Algorithms (BIOINFORMATICS), 2012, pages 202-207

[ abstract | preprint PDF ]

We introduce first feasible path-based graph kernel. The main contribution is to apply a compressed string index to store millions of paths efficiently. We utilize the path kernel to predict chemical reaction function (EC class) over reaction graphs.

Markus Heinonen, Sampsa Lappalainen, Taneli Mielikäinen and Juho Rousu

**Computing atom mappings for biochemical reactions without subgraphs isomorphism**

Journal of Computational Biology 18(1):43-58, 2011

[ abstract | preprint PDF ]

[ KEGG 01/2009 atommappings | bin + src ]

We study the problem of mapping the atoms between reactants and products in a chemical reaction. We introduce the first definition of optimality of such mappings through graph edit distance. An A* algorithm is applied to compute the optimal mappings of KEGG reactions. We also introduce atom level descriptors through a message passing algorithm.

Hongyu Su, Markus Heinonen and Juho Rousu

**Structured output prediction of anti-cancer drug activity**

Proceedings of PRIB 2010

[ abstract | PDF ]

We utilize MMCRF for structured output prediction on small molecules for effectiveness against 59 cancer cell lines. Structured prediction outperforms individual SVM's clearly. However, the structure of the outputs seems to have little effect on performance.

Hongyu Su, Markus Heinonen and Juho Rousu

**Multilabel Classification of Drug-like Molecules via Max-margin Conditional Random Fields**

Proceedings of PGM 2010

[ PDF ]

Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Juha Kokkonen, Jari Kiuru, Raimo Ketola and Juho Rousu

**FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data **

Rapid Communications in Mass Spectrometry 22:3043-3052, 2008

[ abstract | PDF ]

We introduce software for identifying product ions from MS/MS data. The method outperforms rule-based methods in our dataset of amino acids and sugarphosphates.

Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Esa Pitkänen, Juha Kokkonen and Juho Rousu

*Ab initio* prediction of molecular fragments from tandem mass spectrometry data

Proceedings of GCB 2006, Vol P-83:40-53

[ PDF ]

We present a combinatorial algorithm for searching of plausible fragment structures for product ion peaks, based on a bond energy scoring function. We also introduce a mixed integer linear programming algorithm for choosing an optimal fragmentation tree.

Suvi Heinonen, Markus Heinonen and Emilia Koivisto

**Full waveform forward seismic modeling of geologically complex environment: Comparison of simulated and field seismic data**

European Geosciences Union (EGU) General Assembly, Vienna, 2012

[ abstract ]

We experiment with full seismic forward simulation modeling as a method to find approximations for seismic models.

**Ph.D. Thesis: Computational methods for small molecules**

University of Helsinki, Department of Computer Science, 2012

[ e-thesis | PDF ]

**M.Sc. Thesis: (in finnish) Algoritminen tytärionien
tunnistus massaspektrometriadatasta (Algorithmic identification of
daughter ions in mass spectrometry data)**

University of Helsinki, Department of Computer Science, 2007

[ PDF ]

Editors/organizers: Masanori Arita, Markus Heinonen and Juho Rousu

**Mass Spectrometry Informatics in Systems Biology (MSiB 2010)**

Abstracts of the Workshop, October 28-29, 2010, Helsinki, Finland

[ abstracts ]

nsgp is an R package for non-stationary Gaussian process regression and differential testing of timeseries in two-sample cases.

To install, execute within R: `install.packages('nsgp', type='source')`

As example, let's load some example data, learn a model and visualise it:

library(nsgp) data(toydata) x = toydata$ctrl$x y = toydata$ctrl$y gp = gpr1sample(x,y,seq(0,20,0.1))

This returns:

> gp Gaussian process model for 201 timepoints: (0, 0.1, 0.2, ..., 19.9, 20) MLL EMLL Avg.posterior.std Avg.noise.std GP model -5.83 51.9 0.155 0.195 Parameters: sigma.f = 0.61 sigma.n = 1.00 l = 9.73 lmin = 0.62 c = 0.02

To plot the result, we execute `plot(gp,plotnoise=T)`

with result:

For two-sample data we run:

library(nsgp) data(toydata) x.ctrl = toydata$ctrl$x x.case = toydata$case$x y.ctrl = toydata$ctrl$y y.case = toydata$case$y gps = gpr2sample(x.ctrl,y.ctrl,x.case,y.case,seq(0,20,0.1))

This returns:

> gps Gaussian process models for case/control and shared null model MLL Avg.posterior.std Avg.noise.std Control model -5.83 0.155 0.195 Case model -4.79 0.223 0.278 Shared null model -9.49 0.179 0.308

To plot we call again `plot(gps,plotnoise=T)`

with result: