This section describes how to compute the posterior mean and variance
of the outputs
*f*_{k}(**s**(*t*)) of the MLP network. Ordinarily the
inputs, weights and biases of an MLP network have fixed values. Here
the inputs
**s**(*t*), weights
A,
B and the
biases
**a**,
**b** have posterior distributions which means
that we also have a posterior distribution of the outputs. One way to
evaluate the posterior mean and variance is to propagate distributions
instead of fixed values through the network. Whole distributions
would be quite tricky to deal with, and therefore we are going to
characterise the distributions by their mean and variance only.

The sources have mixture-of-Gaussians distributions for which it is
easy to compute the mean and variance:

= | (19) | ||

= | (20) |

Then the sources are multiplied with the first layer weight matrix A and the bias

Equation (31) follows from the identity

(23) |

For computing the posterior mean of the output

g_{j}(y_{j}(t)) |
|||

(24) |

Since the posterior mean of

The second order expansion was chosen because those are the terms whose posterior mean can be expressed in terms of posterior mean and variance of the input. Higher order terms would have required higher order cumulants of the input, which would have increased the computational complexity with little extra benefit.

For the posterior variance of
*g*_{j}(*y*_{j}(*t*)) the second order expansion
would result in terms which need higher than second order knowledge
about the inputs. Therefore we shall use the first order Taylor's
series expansion which then yields the following approximation for the
posterior variance of
*g*_{j}(*y*_{j}(*t*)):

The next step is to compute the mean and variance of the output after the second layer mapping. The outputs are given by . The equation for the posterior mean is similar to (30):

(27) |

The equation for the posterior variance is more complicated than (31), however, since

We shall use a first order approximation of the mapping
**f**(**s**(*t*)) for measuring the interference. This is
consistent with the first order approximation of the nonlinearities
*g*_{j} and yields the following equation for the posterior variance of
*f*_{k}(*t*):

where the posterior means of the partial derivatives are obtained by the chain rule

and denotes the posterior variance of

Notice that appears in (39) and appears in (37). These terms do not contribute to interference, however, because they are the parts which are randomised by multiplication with