Main Structure

Figure

shows the structure for hierarchical nonlinear factor analysis with variance modelling (HNFA+VM). It utilises variance neurons and nonlinearities in building a hierarchical model for both the means and variances. Without the variance neurons the model would correspond to a multi-layer perceptron with latent variables as hidden neurons. Note that computational nodes as hidden neurons would result in multiple paths from upper layer latent variables to the observations. This type of structure was used in [43] and it has a quadratic as opposed to linear computational complexity.

**Figure:** HNFA+VM model can be built up in stages. Left: A variance neuron is attached to each Gaussian observation node. The nodes represent vectors. Middle: A layer of sources with variance neurons attached to them is added. The nodes next to the weight matrices A₁ and B₁ represent affine transformations including a bias term. Right: Another layer is added. The size of the layers may vary. More layers can be added in the same manner. Note that some parameters are left out of the picture for clarity.
$\begin{figure} \begin{center} \epsfig{file=pics/exper_set.eps,width=0.9\textwidth} \end{center} \end{figure}$

The exact formulation of HNFA+VM is as follows. The observed data matrix X has T observations of n₁ dimensions. X is named s₁(t) for notational simplicity

$\begin{displaymath}\boldsymbol{X}= \left[\mathbf{s}_{1}(1),\mathbf{s}_{1}(2),\dots,\mathbf{s}_{1}(T)\right], \end{displaymath}$

(5.1)

On each layer i, there are n_i sources assembled to a vector s_i. The dimensions of the vectors are marked with s_i,k, $k\in\{1,2,\dots,n_i\}$ . The sources on upper layers i>1are mapped through a Gaussian nonlinearity

f(s_i(t))

$\displaystyle \left[ \begin{array}{c} \exp(-s_{i,1}(t)^2) \\ \exp(-s_{i,2}(t)^2) \\ \vdots \\ \exp(-s_{i,n_i}(t)^2) \end{array} \right].$

(5.2)

The connection downwards after the nonlinearity is done using the affine mappings

m_i^s(t)	=	$\displaystyle \left\{ \begin{array}{ll} \mathbf{A}_{i}\mathbf{f}(\mathbf{s}_{i+... ...ox{if $i<n$ } \\ \boldsymbol{\mu}_{s,i} & \mbox{if $i=n$ } \end{array} \right.$	(5.3)
m_i^u(t)	=	$\displaystyle \left\{ \begin{array}{ll} \mathbf{B}_{i}\mathbf{f}(\mathbf{s}_{i+... ...{if $i<n$ } \\ \boldsymbol{\mu}_{u,i} & \mbox{if $i=n$ } \end{array} \right. ,$	(5.4)

Each source s_i,k has a corresponding variance neurons u_i,k. The signals m_i^s(t) and m_i^u(t)are used as prior means for them

$\displaystyle p(s_{i,k}(t)\mid \mathbf{s}_{i+1}(t), u_{i,k}(t), \dots % )$	=	$\displaystyle \operatorname{N}\left(s_{i,k}(t); m_{i,k}^{s}(t),\exp(-u_{i,k}(t))\right)$	(5.5)
$\displaystyle p(u_{i,k}(t)\mid \mathbf{s}_{i+1}(t), \dots % )$	=	$\displaystyle \operatorname{N}\left(u_{i,k}(t); m_{i,k}^{u}(t), \exp(-\sigma_{u,i,k})\right) .$	(5.6)