Improved approximation of the posterior probability

Next: Feedforward and backward computations Up: Model and learning algorithm Previous: Model and learning algorithm

Improved approximation of the posterior probability

In [4], the posterior probability of the unknown variables was approximated as Gaussian distribution with diagonal covariance matrix. This means that the unknown variables were approximated to be independent given the observations. Notice that this assumption is false even if the unknown variables are assumed to be independent a priori. For instance, the factors are assumed to be independent a priori but observations induce dependencies between them. These dependencies are strongest for factors at the same time instant, that is, s_i(t) and s_j(t) have a posterior dependence but s_i(t₁)and s_j(t₂) can be nearly independent when t₁ is far from t₂.

The NLFA model has an indeterminacy of the rotation of the factors and the model can utilise this by choosing the rotation which makes the factors independent not only a priori but also a posteriori. However, the inclusion of the dynamic model causes posterior dependencies between factors at different time steps and these dependencies do not vanish for any rotation or any other mapping of the factor space. The smaller the process noise the stronger the dependence will be.

Taking into account the full posterior covariance between s(t-1) and s(t) is computationally costly if the dimension of the factor space is large. In practice, the most significant posterior correlations are the posterior autocorrelations of the factors, that is, the correlations between s_i(t-1) and s_i(t). They can be taken into account without increasing the computational complexity significantly.

In [4], the approximation of the posterior probability of the factors had the factorial form

$\begin{displaymath}q({\mathbf{S}} \vert {\mathbf{X}}) = \prod_{i,t} q(s_i(t) \vert {\mathbf{X}}) \, , \end{displaymath}$

(5)

where S denotes the factors and X the observations. The approximations q(s_i(t) | X) were Gaussian with mean $\bar{s}_i(t)$ and variance $\tilde{s}_i(t)$ . For the dynamical model introduced in this report, the approximation has the form

$\begin{displaymath}q({\mathbf{S}} \vert {\mathbf{X}}) = \prod_{i,t} q(s_i(t) \vert s_i(t-1), {\mathbf{X}}) \, , \end{displaymath}$

(6)

where q(s_i(t) | s_i(t-1), X) is Gaussian and depends linearly on s_i(t-1):

$\begin{displaymath}q(s_i(t) \vert s_i(t-1), {\mathbf{X}}) = N(s_i(t) \vert \bar{... ...3ex}[0.5ex][0ex]{$\scriptscriptstyle \,\circ$ }}{s}_i(t)) \, . \end{displaymath}$

(7)

Here $N(\xi \vert \mu, \sigma^2)$ denotes a Gaussian distribution over $\xi$ with mean $\mu$ and variance $\sigma^2$ .

Given s_i(t-1), the posterior variance of s_i(t) is $\stackrel{\raisebox{-0.3ex}[0.5ex][0ex]{$\scriptscriptstyle \,\circ$ }}{s}_i(t)$ and the posterior mean is $\bar{s}_i(t) + \breve{s}_i(t,t-1) [s_i(t-1) - \bar{s}_i(t-1)]$ . The approximate posterior q(s_i(t) | s_i(t-1), X) is thus parametrised by the mean $\bar{s}_i(t)$ , linear dependence $\breve{s}_i(t,t-1)$ and variance $\stackrel{\raisebox{-0.3ex}[0.5ex][0ex]{$\scriptscriptstyle \,\circ$ }}{s}_i(t)$ as defined by (7), whereas in NLFA the posterior of the factor was parametrised by mean and variance alone.

It is easy to see by induction that if the past values of the factors are marginalised out, the posterior mean of s_i(t) is $\bar{s}_i(t)$ and posterior variance $\tilde{s}_i(t)$ is

$\begin{displaymath}\tilde{s}_i(t) = \stackrel{\raisebox{-0.3ex}[0.5ex][0ex]{$\sc... ...\circ$ }}{s}_i(t) + \breve{s}^2_i(t,t-1) \tilde{s}_i(t-1) \, . \end{displaymath}$

(8)

Notice that the marginalised variances $\tilde{s}_i(t)$ are computed recursively in a sweep forward in time.

Next: Feedforward and backward computations Up: Model and learning algorithm Previous: Model and learning algorithm

Harri Valpola
2000-10-17