Variational Bayesian method

Next: Total Derivatives Up: Nonlinear State-Space Models Previous: Inference Methods

Variational Bayesian method

Nonlinear dynamical factor analysis (NDFA) [1] is a variational Bayesian method for learning nonlinear state-space models. The mappings $\mathbf{f}$ and $\mathbf{g}$ in Eqs. (1) and (2) are modelled with multilayer perceptron (MLP) networks whose parameters can be learned from the data. The parameter vector $\boldsymbol{\theta}$ include network weigths, noise levels, and hierarchical priors for them. The posterior distribution over the sources $\mathbf{S}=\left[\mathbf{s}(1),\dots,\mathbf{s}(T)\right]$ and the parameters $\boldsymbol{\theta}$ is approximated by a Gaussian distribution $q(\mathbf{S},\boldsymbol{\theta})$ with some further independency assumptions. Both learning and inference are based on minimising a cost function ${\cal C}_{\mathrm{KL}}$

$\displaystyle {\cal C}_{\mathrm{KL}}= \int_{\boldsymbol{\theta}}\int_{\mathbf{S... ...{p(\mathbf{X},\mathbf{S},\boldsymbol{\theta})}d\mathbf{S} d\boldsymbol{\theta},$

(5)

where $p(\mathbf{X},\mathbf{S},\boldsymbol{\theta})$ is the joint probability density over the data $\mathbf{X}=\left[\mathbf{x}(1),\dots,\mathbf{x}(T)\right]$ , sources $\mathbf{S}$ , and parameters $\boldsymbol{\theta}$ . The cost function is based on Kullback-Leibler divergence between the approximation and the true posterior. It can be split into terms, which helps in studying only a part of the model at a time. The variational approach is less prone to overfitting compared to maximum a posteriori estimates and still fast compared to Monte Carlo methods. See [1] for details.

The variational Bayesian inference algorithm in [1] uses the gradient of the cost function w.r.t. state in a heuristic manner. We propose an algorithm that differs from it in three ways. Firstly, the heuristic updates are replaced by a standard conjugate gradient algorithm [11]. Secondly, the linearisation method from [7] is applied. Thirdly, the gradient is replaced by a vector of approximated total derivatives, as described in the following section.

Next: Total Derivatives Up: Nonlinear State-Space Models Previous: Inference Methods

Tapani Raiko 2005-12-08