T. Raiko, H. Valpola, M. Harva, and J. Karhunen. Building Blocks for Variational Bayesian Learning of Latent Variable Models. Report E4 in the Electronic report series of CIS, April, 2006, accepted for publication conditioned on minor revisions to the Journal of Machine Learning Research.
T. Raiko, H. Valpola, T. Östman, and J. Karhunen. Missing Values in Hierarchical Nonlinear Factor Analysis. In the Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP 2003), pp. 185-189, Istanbul, Turkey, June 26-29, 2003.
T. Raiko. Partially Observed Values. In the Proceedings of the International Joint Conference on Neural Networks (IJCNN 2004), pp. 2825-2830, Budapest, Hungary, July 25-29, 2004.
T. Raiko and M. Tornio. Learning Nonlinear State-Space Models for Control. In the Proceedings of the International Joint Conference on Neural Networks (IJCNN 2005), pp. 815-820, Montreal, Canada, July 31-August 4, 2005.
T. Raiko, M. Tornio, A. Honkela, and J. Karhunen. State Inference in Variational Bayesian Nonlinear State-Space Models. In the Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA 2006), pp. 222-229, Charleston, South Carolina, USA, March 5-8, 2006.
T. Raiko. Nonlinear Relational Markov Networks with an Application to the Game of Go. In the Proceedings of the International Conference on Artificial Neural Networks (ICANN 2005), pp. 989-996, Warsaw, Poland, September 11-15, 2005.
K. Kersting, L. De Raedt, and T. Raiko. Logical Hidden Markov Models. In the Journal of Artificial Intelligence Research, Volume 25, pp. 425-456, April, 2006.
K. Kersting, T. Raiko, S. Kramer, and L. De Raedt. Towards Discovering Structural Signatures of Protein Folds based on Logical Hidden Markov Models. In the Proceedings of the Pacific Symposium on Biocomputing (PSB-2003), pp. 192-203, Kauai, Hawaii, January 3-7, 2003.
K. Kersting and T. Raiko. 'Say EM' for Selecting Probabilistic Models for Logical Sequences. In the Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005), pp. 300-307, Edinburgh, Scotland, July 26-29, 2005.

List of abbreviations

AI Artificial intelligence

BIC Bayesian information criterion

BLP Bayesian logic program

BP Belief propagation (algorithm)

EM Expectation maximisation

FA Factor analysis

HNFA Hierarchical nonlinear factor analysis

HMM Hidden Markov model

ICA Independent component analysis

ILP Inductive logic programming

KL Kullback-Leibler (divergence)

LOHMM Logical Hidden Markov model

MAP Maximum a posteriori (estimate)

ML Maximum likelihood (estimate)

MCMC Markov chain Monte Carlo

MLP Multilayer perceptron (network)

NDFA Nonlinear dynamic factor analysis

NMN Nonlinear Markov network

NRMN Nonlinear relational Markov network

NSSM Nonlinear state-space model

pdf Probability density function

PoE Product of experts

PCA Principal component analysis

PRM Probabilistic relational model

RMN Relational Markov network

SRL Statistical relational learning

VB Variational Bayesian

List of symbols

$\land$ And

$\lnot$ Negation

$ A,B,C$ Variables, events, or actions

$ x,y,z$ Scalar variables

$P(A\mid B)$ Probability of $ A$ given $ B$

$p(A\mid B)$ Probability density of $ A$ given $ B$

$\boldsymbol{X}$ Observations (or data)

$\boldsymbol{\Theta}$ Unknown variables $\boldsymbol{\Theta}=(\boldsymbol{\theta},\boldsymbol{S})$

$\boldsymbol{\theta}$ Model parameters $\boldsymbol{\theta}$

$\boldsymbol{S}$ Latent variables

$ U(A)$ Utility of $ A$

$\mathcal{H}$ Model structure and prior belief

$\mathcal{N}\left(x;y,z\right)$ Gaussian distribution of $ x$ with a mean $ y$ and a variance $ z$

$\propto$ Proportional to (or equals after normalisation)

$\pi$ Message sent away from root (belief propagation algorithm)

$\lambda$ Message sent towards the root (belief propagation algorithm)

$\psi$ Potential in a Markov network

$q(\boldsymbol{\Theta})$ Approximation of the posterior distribution $p(\boldsymbol{\Theta}\mid \boldsymbol{X})$

$D(q \parallel p)$ Kullback-Leibler divergence between $ q$ and $ p$

${\bf x}(t)$ Observation (or data) vector for (time) index $ t$

${\bf s}(t)$ Source (or factor) vector for (time) index $ t$

${\bf u}(t)$ Auxiliary vector (either for control or variance modelling)

$\mathbf{f}$ Mapping from the source space to the observation space

$\mathbf{g}$ Mapping for modelling dynamics in the source space

${\bf A},{\bf B},{\bf C},{\bf D}$ Matrices belonging to parameters $\boldsymbol{\theta}$

$\overline{\theta}$ Mean of the parameter $\theta$ in the approximating posterior distribution $ q$

$\widetilde{\theta}$ Variance of the parameter $\theta$ in the approximating posterior distribution $ q$

$\langle \cdot \rangle$ Expectation over the distribution $ q$

$ X,Y,Z$ Logical variables

$\leftarrow$ Follows from (in logic programming)

$\mathsf{X}$ Observed sequence of logical atoms

Next: Introduction Up: Bayesian Inference in Nonlinear Previous: Avainsanat:

Tapani Raiko 2006-11-21

AI	Artificial intelligence
BIC	Bayesian information criterion
BLP	Bayesian logic program
BP	Belief propagation (algorithm)
EM	Expectation maximisation
FA	Factor analysis
HNFA	Hierarchical nonlinear factor analysis
HMM	Hidden Markov model
ICA	Independent component analysis
ILP	Inductive logic programming
KL	Kullback-Leibler (divergence)
LOHMM	Logical Hidden Markov model
MAP	Maximum a posteriori (estimate)
ML	Maximum likelihood (estimate)
MCMC	Markov chain Monte Carlo
MLP	Multilayer perceptron (network)
NDFA	Nonlinear dynamic factor analysis
NMN	Nonlinear Markov network
NRMN	Nonlinear relational Markov network
NSSM	Nonlinear state-space model
pdf	Probability density function
PoE	Product of experts
PCA	Principal component analysis
PRM	Probabilistic relational model
RMN	Relational Markov network
SRL	Statistical relational learning
VB	Variational Bayesian

$\land$	And
$\lnot$	Negation
	Variables, events, or actions
	Scalar variables
$P(A\mid B)$	Probability of given
$p(A\mid B)$	Probability density of given
$\boldsymbol{X}$	Observations (or data)
$\boldsymbol{\Theta}$	Unknown variables $\boldsymbol{\Theta}=(\boldsymbol{\theta},\boldsymbol{S})$
$\boldsymbol{\theta}$	Model parameters $\boldsymbol{\theta}$
$\boldsymbol{S}$	Latent variables
	Utility of
$\mathcal{H}$	Model structure and prior belief
$\mathcal{N}\left(x;y,z\right)$	Gaussian distribution of with a mean and a variance
$\propto$	Proportional to (or equals after normalisation)
$\pi$	Message sent away from root (belief propagation algorithm)
$\lambda$	Message sent towards the root (belief propagation algorithm)
$\psi$	Potential in a Markov network
$q(\boldsymbol{\Theta})$	Approximation of the posterior distribution $p(\boldsymbol{\Theta}\mid \boldsymbol{X})$
$D(q \parallel p)$	Kullback-Leibler divergence between and
${\bf x}(t)$	Observation (or data) vector for (time) index
${\bf s}(t)$	Source (or factor) vector for (time) index
${\bf u}(t)$	Auxiliary vector (either for control or variance modelling)
$\mathbf{f}$	Mapping from the source space to the observation space
$\mathbf{g}$	Mapping for modelling dynamics in the source space
${\bf A},{\bf B},{\bf C},{\bf D}$	Matrices belonging to parameters $\boldsymbol{\theta}$
$\overline{\theta}$	Mean of the parameter $\theta$ in the approximating posterior distribution
$\widetilde{\theta}$	Variance of the parameter $\theta$ in the approximating posterior distribution
$\langle \cdot \rangle$	Expectation over the distribution
	Logical variables
$\leftarrow$	Follows from (in logic programming)
$\mathsf{X}$	Observed sequence of logical atoms

Contents