As before, the general cost function of ensemble learning, as given in Equation (3.11), is
In the NSSM, all the probability distributions involved are Gaussian
so most of the terms will resemble the corresponding ones of the
CDHMM. For the parameters
,
The term
is a little more complicated:
The first term reduces to Equation (6.26) but the second term is a little different:
The expectation of
has been evaluated in
Equation (6.5), so the only remaining terms are
and
. They both involve the nonlinear mappings
and
, so they cannot be evaluated exactly.
The formulas allowing to approximate the distribution of the outputs
of an MLP network
are presented in
Appendix B. As a result we get the posterior
mean of the outputs
and the posterior variance,
decomposed as
With these results the remaining terms of the cost function are relatively easy to evaluate. The likelihood term is a standard Gaussian and yields
The source term is more difficult. The problematic expectation is
Using Equation (6.31), the remaining term of the cost function can be written as