Almost all the distributions appearing in our model are assumed to be
Gaussians, and consequently almost all the terms appearing in the cost
function are expectations of logarithms of Gaussian distributions. We
shall use the second layer biases
**a** as an example. For each
element *a*_{i}, there is one term in
and
,
namely the terms *q*(*a*_{i}) and
*p*(*a*_{i} | *m*_{a}, *v*_{a}).
The cost function therefore includes terms
and
.
In the first expectation the terms *q*(*a*_{i})only depends on *a*_{i} which means that we can integrate over the other
variables and we have

The same happens for the other integral:

Recall that

The integral in (18) also fairly easy and it can be shown that the result is

Again the result is based on the fact that

The following terms are the only ones whose expectations in
(16) give different results than (19) or
(20):
*q*(*M*_{i}(*t*), *s*_{i}(*t*)),
*p*(*M*_{i}(*t*) | **c**_{i}),
*p*(*s*_{i}(*t*) | *M*_{i}(*t*), **m**_{si}, **v**_{si}) and
.

The index *M*_{i}(*t*) is discrete and therefore we have a summation
instead of integration in the cost function. Let us denote
and denote the mean and variance of
the Gaussian
*q*(*s*_{i}(*t*) | *M*_{i}(*t*) = *l*) by
and
.
Then the expectations of
in
(16) are given by

For the expectation of
we shall first
evaluate the following integral:

The resulting integral can be approximated by applying a second order
Taylor's series expansion of
with respect
to *c*_{il'} around the posterior mean
.
This yields
the following approximation for the integral:

where
.
Now we see that the expectation of
is

Since both
*q*(*s*_{i}(*t*) | *M*_{i}(*t*)) and
*p*(*s*_{i}(*t*) | *M*_{i}(*t*), **m**_{si},
**v**_{si}) are Gaussian, the terms
have expectations which are similar to
(20):

which equals to the sum of terms

weighted by
.

The observations *x*_{k}(*t*) are known -- unless there are missing
values -- which means that there are no terms of the form
.
The expectations of
are
the most difficult terms in the cost function. If the posterior mean
and variance of the function
*f*_{k}(**s**(*t*)) are known -- let us
denote them by
and
for short -- then the
expectation has a form similar to (20):