(aside image)

Linear State-Space Model with Time-Varying Dynamics

Jaakko Luttinen, Tapani Raiko, Alexander Ilin

Author-created version: PDF

Poster: PDF

Slides: PDF

Extra material for the paper to appear in ECML/PKDD 2014. This page provides videos of the SPDE processes used in the experiments and instructions on reproducing the experiments.

Video visualizations of the SPDE processes

Five videos (open WebM format) showing the simulated stochastic advection-diffusion processes used in the experimental section:
Video of SPDE 1
Video of SPDE 2
Video of SPDE 3
Video of SPDE 4
Video of SPDE 5
The crosses denote the locations of the observations.

Setting up the environment

The experiment scripts use a variational Bayesian package for Python called BayesPy. You must install BayesPy in order to use the experiment scripts. The instructions below are targeted for Linux users but may be applicable to other systems if modified appropriately.

Installing BayesPy

The paper uses the latest version of BayesPy, which is available under the GPLv3 license. It requires Python 3, NumPy (>=1.8), SciPy (>=0.13), Matplotlib (>=1.2) and h5py. Installation of these requirements can be a bit tricky but you can refer to the installation guides of these packages. If your system does not contain recent enough versions of the packages and you do not have permission to upgrade them, it is recommended to use a virtual environment.

virtualenv -p python3 --system-site-packages ENV
source ENV/bin/activate

The required packages can be installed using pip:

pip install numpy --upgrade
pip install scipy --upgrade
pip install distribute --upgrade
pip install matplotlib --upgrade
pip install h5py

If you have problems installing these packages, refer to the installation instructions in the BayesPy documentation or to the documentation of the corresponding package. You may need to install some system libraries in order to compile and install these packages.

After you have installed the requirements, installing the latest BayesPy is simple:

pip install https://github.com/bayespy/bayespy/archive/master.zip

In order to verify that BayesPy is working properly, it is recommended to install Nose and run the unit tests:

pip install nose
nosetests bayespy

If the unit tests are successfull, congratulations, you have a working installation of BayesPy and you can run the experiments presented in the paper. The documentation for BayesPy can be found at bayespy.org.

Downloading the experiment scripts

The Python scripts experiments.tar.gz for reproducing the experiments can be downloaded and extracted as follows:

wget http://users.ics.aalto.fi/jluttine/ecml2014/experiments.tar.gz
tar -xf experiments.tar.gz

These files are available under the GPLv3 license.

Obtaining GSOD data from NCDC

Note that the artificial data in the first and the second experiments are generated by the scripts. The GSOD data by NCDC is available for download. However, note that the whole database is large and only the years 2000-2009 are used in the experiment. Our Python script gsod.py contains functions for parsing the downloaded data into a nice HDF5 data file which can be used by the experiment script gsod_experiment.py.

First, download the data which was used in the real-world air temperature experiment:

wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/ish-history.csv
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2000/gsod_2000.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2001/gsod_2001.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2002/gsod_2002.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2003/gsod_2003.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2004/gsod_2004.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2005/gsod_2005.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2006/gsod_2006.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2007/gsod_2007.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2008/gsod_2008.tar
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2009/gsod_2009.tar

Second, parse the data into HDF5 using the experiment scripts:

python gsod.py

Note that this parsing may take several hours because of a simple implementation. Fortunately, the data needs to be parsed only once into HDF5 format. By default, the GSOD data files are assumed to be in the current directory and the time period to parse is 2000-2009. These can be changed with arguments:

python gsod.py --path=/path/to/gsod/data/ --first=2001 --last=2005

GSOD data is available for "free and unrestricted use in research, education, and other non-commercial activities".

Running the experiments

It is assumed that you have successfully installed BayesPy and downloaded GSOD data (if you want to run that experiment).

In the first experiment, the data were observations from 1-dimensional oscillating signal with changing frequency. The experiment can be run as follows:

python toy_experiment.py --dynamics=constant
python toy_experiment.py --dynamics=switching
python toy_experiment.py --dynamics=varying

In the second experiment, the data were observations from a simulated stochastic advection-diffusion process. The experiment can be run as follows:

python spde_experiment.py --dynamics=constant --seed=1
python spde_experiment.py --dynamics=constant --seed=1 --d=60
python spde_experiment.py --dynamics=switching --seed=1
python spde_experiment.py --dynamics=varying --seed=1

All the five set of experiments can be run by changing the seed (1,2,3,4,5).

In the third experiment, the data were daily mean temperature measurements in Europe. The experiment can be run as follows:

python gsod_experiment.py --dynamics=constant --seed=1
python gsod_experiment.py --dynamics=switching --seed=1
python gsod_experiment.py --dynamics=varying --seed=1

Again, all the five set of experiments can be run by changing the seed (1,2,3,4,5). The experiments plot the approximate VB posterior distributions for some variables during the learning. If you want to disable this, use the command line argument --no-monitor.

All the experiment scripts have more command line arguments in case the user wants to play with the experiments (see them with --help). Note that especially the second and the third experiments require computational resources and may take a long time to run.