Next: Cost Function
Up: Nonlinear Factor Analysis
Previous: Nonlinear Factor Analysis
Figure 3:
The mapping from sources to observations is modelled by the
familiar MLP network. The sources are on the top layer and
observations in the bottom layer. The middle layer consists of
hidden neurons each of which computes a nonlinear function of the
inputs

The schematic structure of the mapping is shown in Fig. 3.
The nonlinearity of each hidden neuron is the hyperbolic tangent,
which is the same as the usual logistic sigmoid except for a scaling.
The equation defining the mapping is

(3) 
The matrices A and B are the weights of first and second layer and
and
are the corresponding biases.
The noise is assumed to be independent and Gaussian and therefore the
probability distribution of x(t) is

(4) 
Each component of the vector
gives the logstd of the
corresponding component of
.
The sources are assumed to have zero mean Gaussian distributions and
again the variances are parametrised by logstd .

(5) 
Since the variance of the sources can vary, variance of the weights
A on the first layer can be fixed to a constant, which we
choose to be one, without loosing any generality from the model. This
is not case for the second layer weights. Due to the nonlinearity,
the variances of the outputs of the hidden neurons are bounded from
above and therefore the variance of the second layer weights cannot be
fixed. In order to enable the network to shut off extra hidden
neurons, the weights leaving one hidden neuron share the same variance
parameter^{}.

(6) 
The elements of the matrix
B are assumed to have a zero mean
Gaussian distribution with individual variances for each column and
thus the dimension of the vector
is the number of hidden
neurons. Both biases
and
have Gaussian
distributions parametrised by mean and logstd.
The distributions are summarised in
(7)(12).



(7) 



(8) 
A 

N(0, 1) 
(9) 
B 


(10) 


N(m_{a}, e^{2va}) 
(11) 


N(m_{b}, e^{2vb}) 
(12) 
The distributions of each set of logstd parameters are modelled by
Gaussian distributions whose parameters are usually called
hyperparameters.


N(m_{vx}, e^{2vvx}) 
(13) 


N(m_{vs}, e^{2vvs}) 
(14) 


N(m_{vB}, e^{2vvB}) 
(15) 
The prior distributions of m_{a}, v_{a}, m_{b}, v_{b} and the six
hyperparameters
are assumed to be Gaussian
with zero mean and standard deviation 100, i.e., the priors are
assumed to be very flat.
Next: Cost Function
Up: Nonlinear Factor Analysis
Previous: Nonlinear Factor Analysis
Harri Lappalainen
20000303