Definition of the Model

Next: Cost Function Up: Nonlinear Factor Analysis Previous: Nonlinear Factor Analysis

Definition of the Model

**Figure 3:** The mapping from sources to observations is modelled by the familiar MLP network. The sources are on the top layer and observations in the bottom layer. The middle layer consists of hidden neurons each of which computes a nonlinear function of the inputs
$\includegraphics[width=8cm]{mlp.eps}$

The schematic structure of the mapping is shown in Fig. 3. The nonlinearity of each hidden neuron is the hyperbolic tangent, which is the same as the usual logistic sigmoid except for a scaling. The equation defining the mapping is

$\begin{displaymath}\vec{x}(t) = f(\vec{s}(t)) + \vec{n}(t) = \mathrm{B} \tanh( \mathrm{A} \vec{s}(t) + \vec{a}) + \vec{b} + \vec{n}(t) \, . \end{displaymath}$

(3)

The matrices A and B are the weights of first and second layer and $\vec{a}$ and $\vec{b}$ are the corresponding biases.

The noise is assumed to be independent and Gaussian and therefore the probability distribution of x(t) is

$\begin{displaymath}\vec{x}(t) \sim N(f(\vec{s}(t)), e^{2\vec{v}_x}) \end{displaymath}$

(4)

Each component of the vector $\vec{v}_x$ gives the log-std of the corresponding component of $\vec{x}(t)$ .

The sources are assumed to have zero mean Gaussian distributions and again the variances are parametrised by log-std $\vec{v}_s$ .

$\begin{displaymath}\vec{s}(t) \sim N(0, e^{2\vec{v}_s}) \end{displaymath}$

(5)

Since the variance of the sources can vary, variance of the weights A on the first layer can be fixed to a constant, which we choose to be one, without loosing any generality from the model. This is not case for the second layer weights. Due to the nonlinearity, the variances of the outputs of the hidden neurons are bounded from above and therefore the variance of the second layer weights cannot be fixed. In order to enable the network to shut off extra hidden neurons, the weights leaving one hidden neuron share the same variance parameter.

$\begin{displaymath}\mathrm{B} \sim N(0, e^{2\vec{v}_B}) \end{displaymath}$

(6)

The elements of the matrix B are assumed to have a zero mean Gaussian distribution with individual variances for each column and thus the dimension of the vector $\vec{v}_B$ is the number of hidden neurons. Both biases $\vec{a}$ and $\vec{b}$ have Gaussian distributions parametrised by mean and log-std.

The distributions are summarised in (7)-(12).

$\displaystyle \vec{x}(t)$	$\textstyle \sim$	$\displaystyle N(f(\vec{s}(t)), e^{2\vec{v}_x})$	(7)
$\displaystyle \vec{s}(t)$	$\textstyle \sim$	$\displaystyle N(0, e^{2\vec{v}_s})$	(8)
A	$\textstyle \sim$	N(0, 1)	(9)
B	$\textstyle \sim$	$\displaystyle N(0, e^{2\vec{v}_B})$	(10)
$\displaystyle \vec{a}$	$\textstyle \sim$	N(m_a, e^2v_a)	(11)
$\displaystyle \vec{b}$	$\textstyle \sim$	N(m_b, e^2v_b)	(12)

The distributions of each set of log-std parameters are modelled by Gaussian distributions whose parameters are usually called hyperparameters.

$\displaystyle \vec{v}_x$	$\textstyle \sim$	N(m_{v_x}, e^2v_{v_x})	(13)
$\displaystyle \vec{v}_s$	$\textstyle \sim$	N(m_{v_s}, e^2v_{v_s})	(14)
$\displaystyle \vec{v}_B$	$\textstyle \sim$	N(m_{v_B}, e^2v_{v_B})	(15)

The prior distributions of m_a, v_a, m_b, v_b and the six hyperparameters $m_{v_s}, \ldots, v_{v_B}$ are assumed to be Gaussian with zero mean and standard deviation 100, i.e., the priors are assumed to be very flat.

Next: Cost Function Up: Nonlinear Factor Analysis Previous: Nonlinear Factor Analysis

Harri Lappalainen
2000-03-03