Author-created version: PDF
Poster: PDF
Slides: PDF
Extra material for the paper to appear in ECML/PKDD 2014. This page provides videos of the SPDE processes used in the experiments and instructions on reproducing the experiments.
Five videos (open WebM format) showing the simulated stochastic advection-diffusion processes used in the experimental section:
Video of SPDE 1
Video of SPDE 2
Video of SPDE 3
Video of SPDE 4
Video of SPDE 5
The crosses denote the locations of the observations.
The experiment scripts use a variational Bayesian package for Python called BayesPy. You must install BayesPy in order to use the experiment scripts. The instructions below are targeted for Linux users but may be applicable to other systems if modified appropriately.
The paper uses the latest version of BayesPy, which is available under the GPLv3 license. It requires Python 3, NumPy (>=1.8), SciPy (>=0.13), Matplotlib (>=1.2) and h5py. Installation of these requirements can be a bit tricky but you can refer to the installation guides of these packages. If your system does not contain recent enough versions of the packages and you do not have permission to upgrade them, it is recommended to use a virtual environment.
virtualenv -p python3 --system-site-packages ENV source ENV/bin/activate
The required packages can be installed using pip:
pip install numpy --upgrade pip install scipy --upgrade pip install distribute --upgrade pip install matplotlib --upgrade pip install h5py
If you have problems installing these packages, refer to the installation instructions in the BayesPy documentation or to the documentation of the corresponding package. You may need to install some system libraries in order to compile and install these packages.
After you have installed the requirements, installing the latest BayesPy is simple:
pip install https://github.com/bayespy/bayespy/archive/master.zip
In order to verify that BayesPy is working properly, it is recommended to install Nose and run the unit tests:
pip install nose nosetests bayespy
If the unit tests are successfull, congratulations, you have a working installation of BayesPy and you can run the experiments presented in the paper. The documentation for BayesPy can be found at bayespy.org.
The Python scripts experiments.tar.gz for reproducing the experiments can be downloaded and extracted as follows:
wget http://users.ics.aalto.fi/jluttine/ecml2014/experiments.tar.gz tar -xf experiments.tar.gz
These files are available under the GPLv3 license.
Note that the artificial data in the first and the second experiments are generated by the scripts. The GSOD data by NCDC is available for download. However, note that the whole database is large and only the years 2000-2009 are used in the experiment. Our Python script gsod.py
contains functions for parsing the downloaded data into a nice HDF5 data file which can be used by the experiment script gsod_experiment.py
.
First, download the data which was used in the real-world air temperature experiment:
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/ish-history.csv wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2000/gsod_2000.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2001/gsod_2001.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2002/gsod_2002.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2003/gsod_2003.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2004/gsod_2004.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2005/gsod_2005.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2006/gsod_2006.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2007/gsod_2007.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2008/gsod_2008.tar wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2009/gsod_2009.tar
Second, parse the data into HDF5 using the experiment scripts:
python gsod.py
Note that this parsing may take several hours because of a simple implementation. Fortunately, the data needs to be parsed only once into HDF5 format. By default, the GSOD data files are assumed to be in the current directory and the time period to parse is 2000-2009. These can be changed with arguments:
python gsod.py --path=/path/to/gsod/data/ --first=2001 --last=2005
GSOD data is available for "free and unrestricted use in research, education, and other non-commercial activities".
It is assumed that you have successfully installed BayesPy and downloaded GSOD data (if you want to run that experiment).
In the first experiment, the data were observations from 1-dimensional oscillating signal with changing frequency. The experiment can be run as follows:
python toy_experiment.py --dynamics=constant python toy_experiment.py --dynamics=switching python toy_experiment.py --dynamics=varying
In the second experiment, the data were observations from a simulated stochastic advection-diffusion process. The experiment can be run as follows:
python spde_experiment.py --dynamics=constant --seed=1 python spde_experiment.py --dynamics=constant --seed=1 --d=60 python spde_experiment.py --dynamics=switching --seed=1 python spde_experiment.py --dynamics=varying --seed=1
All the five set of experiments can be run by changing the seed (1,2,3,4,5).
In the third experiment, the data were daily mean temperature measurements in Europe. The experiment can be run as follows:
python gsod_experiment.py --dynamics=constant --seed=1 python gsod_experiment.py --dynamics=switching --seed=1 python gsod_experiment.py --dynamics=varying --seed=1
Again, all the five set of experiments can be run by changing the seed (1,2,3,4,5). The experiments plot the approximate VB posterior distributions for some variables during the learning. If you want to disable this, use the command line argument --no-monitor
.
All the experiment scripts have more command line arguments in case the user wants to play with the experiments (see them with --help
). Note that especially the second and the third experiments require computational resources and may take a long time to run.