Photo

Markus Heinonen, postdoctoral fellow

I am an Academy of Finland Postdoctoral Fellow for 2016-2019 in Aalto University at the Department of Computer Science.

I am part of the research groups

Office: room B360, CS-building, Konemiehentie 2, FI-02150 Espoo
email: markus.o.heinonen@aalto.fi
mobile: +358 44 294 2600

Mailing address

Aalto University
Department of Information and Computer Science
PO Box 15400
FI-00076 Aalto
Finland


My research focuses on nonstationary, multi-task, large-scale or latent Gaussian process based models, on Bayesian modelling, on metabolomics models and on kernel-based machine learning models for enzymes.

Keywords: machine learning, Gaussian processes, nonstationary regression, synthetic biology, enzyme design, graph kernels, kernel methods, differential testing, nonparameteric ODEs, random fourier features


Research

Refereed publications

Emmi Jokinen, Markus Heinonen, Harri Lähdesmäki
mGPfusion: Predicting protein stability changes upon single and multiple mutations with Gaussian processes and data fusion
Bioinformatics, submitted, 2017
[ github ]

We combine experimental and simulated data to learn a Gaussian process based protein stability predictor. We propose a Bayesian data transformation that calibrates the simulated data against the experimental one. Our method requires less experimental measurements due to inclusion of simulated data.

Sami Remes, Markus Heinonen, Samuel Kaski
A Mutually-Dependent Hadamard Kernel for Modelling Latent Variable Couplings
ACML'17, to appear
[ arxiv | github ]

We introduce a new non-stationary kernel between inputs and signals, which allow non-stationary couplings between latent variables. The new kernel is based on Gibbs kernel and Generalised Wishart Process.

Sami Remes, Markus Heinonen, Samuel Kaski
Non-Stationary Spectral Kernels
NIPS'17, to appear
[ arxiv | github ]

We introduce non-stationary spectral kernels, which can learn covariances based on input-dependent frequencies (e.g. wavelets). We model the input-dependent frequencies as Gaussian process mixtures, and can learn signals with varying frequencies.

Romain Brault, Florence d'Alche-Buc, Markus Heinonen
Random Fourier Features for operator-valued kernels
ACML 2016, PMLR 63:110-125
[ abstract | PDF ]

We introduce random fourier features for vector-valued function learning, i.e. RFF's for operator-valued kernels.

Markus Heinonen, Henrik Mannerström, Juho Rousu, Samuel Kaski, Harri Lähdesmäki
Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo
AISTATS'16, JMLR 51:732-740, 2016
[ abstract | PDF | supplements | code ]

We model all kernel parameters and the noise as separete Gaussian processes which are smoothly input-dependent. HMC sampling reveals the full parameter function posteriors.

Tiina Pakula, Heli Nygren, Dorothee Barth, Markus Heinonen, Sandra Castillo, Merja Penttilä, Mikko Arvas
Genome wide analysis of protein production load in Trichoderma reesei
Biotechnology for Biofuels, 9:132, 2016
[ abstract ]

Transcriptomics and metabolic analysis of Trichoderma Reesei protein production.

Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Georges Tarlet, Marc Benderittter, Farida Zehraoui, Florence d'Alche-Buc
Detecting time periods of differential gene expression using Gaussian processes: An application to endothelial cells exposed to radiotherapy dose fraction
Bioinformatics, 31(5): 728-735, 2015
[ abstract | nsgp R package ]

We propose a two-sample differential testing model on Gaussian processes. We introduce a new two-sample test that is continuous along time and results in differential confidences along time.

Markus Heinonen, Florence d'Alche-Buc
Learning nonparametric differential equations with operator-valued kernels and gradient matching
arXiv, 2014
[ abstract | PDF ]

Markus Heinonen, Olivier Guipaud, Fabien Milliat, Valerie Buard, Beatrice Micheau, Florence d'Alche-Buc
Time-dependent gaussian process regression and significance analysis for sparse time-series
In MLSB'13

We propose non-stationary kernels for Gaussian processes and a new Gaussian process optimization criteria suitable for sparse data. We propose new likelihood ratio tests for significance analysis using GP's.

Huibin Shen, Nicola Zamboni, Markus Heinonen, Juho Rousu
Metabolite Identification trough Machine Learning -- Tackling CASMI Challenge using FingerID
Metabolites, 3:484-505, 2013
[ abstract ]

Our experiences in the CASMI metabolite identification challenge.

Markus Heinonen, Huibin Shen, Nicola Zamboni and Juho Rousu
Metabolite identification and fingerprint prediction via machine learning
Bioinformatics, 28(18):2333-41, 2012
[ abstract | preprint PDF ]

First application of machine learning to identify metabolites based on MS/MS data. We use probability product kernel over mass spectral features to learn a mapping between mass spectrum and binary structural properties of the unknown metabolite. We show that the properties can be used to query the unknown structure from e.g. PubChem.

Markus Heinonen, Niko Välimäki, Veli Mäkinen and Juho Rousu
Efficient path kernels for reaction function prediction
In Proceedings of 3rd International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS), 2012, pages 202-207
[ abstract | preprint PDF ]

We introduce first feasible path-based graph kernel. The main contribution is to apply a compressed string index to store millions of paths efficiently. We utilize the path kernel to predict chemical reaction function (EC class) over reaction graphs.

Markus Heinonen, Sampsa Lappalainen, Taneli Mielikäinen and Juho Rousu
Computing atom mappings for biochemical reactions without subgraphs isomorphism
Journal of Computational Biology 18(1):43-58, 2011
[ abstract | preprint PDF ]
[ KEGG 01/2009 atommappings | bin + src ]

We study the problem of mapping the atoms between reactants and products in a chemical reaction. We introduce the first definition of optimality of such mappings through graph edit distance. An A* algorithm is applied to compute the optimal mappings of KEGG reactions. We also introduce atom level descriptors through a message passing algorithm.

Hongyu Su, Markus Heinonen and Juho Rousu
Structured output prediction of anti-cancer drug activity
Proceedings of PRIB 2010
[ abstract | PDF ]

We utilize MMCRF for structured output prediction on small molecules for effectiveness against 59 cancer cell lines. Structured prediction outperforms individual SVM's clearly. However, the structure of the outputs seems to have little effect on performance.

Hongyu Su, Markus Heinonen and Juho Rousu
Multilabel Classification of Drug-like Molecules via Max-margin Conditional Random Fields
Proceedings of PGM 2010
[ PDF ]

Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Juha Kokkonen, Jari Kiuru, Raimo Ketola and Juho Rousu
FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data
Rapid Communications in Mass Spectrometry 22:3043-3052, 2008
[ abstract | PDF ]

We introduce software for identifying product ions from MS/MS data. The method outperforms rule-based methods in our dataset of amino acids and sugarphosphates.

Markus Heinonen, Ari Rantanen, Taneli Mielikäinen, Esa Pitkänen, Juha Kokkonen and Juho Rousu
Ab initio prediction of molecular fragments from tandem mass spectrometry data
Proceedings of GCB 2006, Vol P-83:40-53
[ PDF ]

We present a combinatorial algorithm for searching of plausible fragment structures for product ion peaks, based on a bond energy scoring function. We also introduce a mixed integer linear programming algorithm for choosing an optimal fragmentation tree.

Posters

Suvi Heinonen, Markus Heinonen and Emilia Koivisto
Full waveform forward seismic modeling of geologically complex environment: Comparison of simulated and field seismic data
European Geosciences Union (EGU) General Assembly, Vienna, 2012
[ abstract ]

We experiment with full seismic forward simulation modeling as a method to find approximations for seismic models.

Theses

Ph.D. Thesis: Computational methods for small molecules
University of Helsinki, Department of Computer Science, 2012
[ e-thesis | PDF ]

M.Sc. Thesis: (in finnish) Algoritminen tytärionien tunnistus massaspektrometriadatasta (Algorithmic identification of daughter ions in mass spectrometry data)
University of Helsinki, Department of Computer Science, 2007
[ PDF ]

Proceedings

Editors/organizers: Masanori Arita, Markus Heinonen and Juho Rousu
Mass Spectrometry Informatics in Systems Biology (MSiB 2010)
Abstracts of the Workshop, October 28-29, 2010, Helsinki, Finland
[ abstracts ]


Software

nsgp

nsgp is an R package for non-stationary Gaussian process regression and differential testing of timeseries in two-sample cases.

install

To install, execute within R: install.packages('nsgp', type='source')

one-sample example


As example, let's load some example data, learn a model and visualise it:

library(nsgp)
data(toydata)
x = toydata$ctrl$x
y = toydata$ctrl$y
gp = gpr1sample(x,y,seq(0,20,0.1))

This returns:

> gp
Gaussian process model for 201 timepoints: (0, 0.1, 0.2, ..., 19.9, 20)

           MLL EMLL Avg.posterior.std Avg.noise.std
GP model -5.83 51.9             0.155         0.195

Parameters:
 sigma.f = 0.61 
 sigma.n = 1.00 
       l = 9.73 
    lmin = 0.62 
       c = 0.02

To plot the result, we execute plot(gp,plotnoise=T) with result:

two-sample example


For two-sample data we run:

library(nsgp)
data(toydata)
x.ctrl = toydata$ctrl$x
x.case = toydata$case$x
y.ctrl = toydata$ctrl$y
y.case = toydata$case$y

gps = gpr2sample(x.ctrl,y.ctrl,x.case,y.case,seq(0,20,0.1))

This returns:

> gps
Gaussian process models for case/control and shared null model

                    MLL Avg.posterior.std Avg.noise.std
Control model     -5.83             0.155         0.195
Case model        -4.79             0.223         0.278
Shared null model -9.49             0.179         0.308

To plot we call again plot(gps,plotnoise=T) with result: