The feedforward computations start with the parameters of the
posterior approximation of the unknown variables of the model. For
the factors, the parameters of the posterior approximation are the
posterior mean
,
the posterior variance
and the dependence
.
The end result of the
feedforward computations is the value of the cost function *C*.

The first stage of the computations is the iteration of
(8) to obtain the marginalised posterior mean
and variance
of the factors. Thereafter
the computations proceed like in the NLFA algorithm: the means and
variances are propagated through the MLP networks. The final stage,
the computation of the cost function, differs only in the terms
and
.
In the NLFA algorithm, the former
had the form

(9) |

but now they have the form

The latter terms can be shown to yield

where the

In the feedbackward phase, the gradient of the cost function *C*w.r.t. the parameters of the posterior approximation is computed by
the back-propagation algorithm, that is, the steps of the feedforward
computations are reversed and the gradient of the cost function is
propagated backwards to the parameters of the posterior approximation.
Since the essential modification to the feedforward phase of NLFA
algorithm is (8), this is also the essential
modification in the backward computations.

The cost function is a function of parameters of the posterior
approximation. In the computation of the cost function, the
marginalised posterior variances
of the factors are used
as intermediate variables and hence the gradient is also computed
through these variables. Let us use the notation
to
mean that *C* is considered to be a function of the intermediate
variables
,
,
in addition to the
parameters of the posterior approximation. The gradient computations
resulting from (8) by the chain rule are then as
follows:

(12) |

(13) |

The terms and can be computed from (10) and (11) while also includes terms originating from the mappings

In the adaptation, the posterior means
of the factors
are treated as in the NLFA algorithm except for the correction in the
step size which is discussed in section 3.3. The
variances
are adapted like
in the NLFA.
The posterior dependence
is adapted by solving
which yields

Equation (15) shows that depends on which in turn depends on as (14) shows. This means that the update of the dependencies and the computation of the gradient w.r.t. the marginalised variance are done recursively backward in time which is the counterpart of (8) where the marginalised variances are computed recursively forward in time.