As before, the general cost function of ensemble learning, as given in Equation (3.11), is

where the expectations are taken over . This will be the case for the rest of the section unless stated otherwise.

In the NSSM, all the probability distributions involved are Gaussian so most of the terms will resemble the corresponding ones of the CDHMM. For the parameters ,

The term is a little more complicated:

The first term reduces to Equation (6.26) but the second term is a little different:

The expectation of has been evaluated in Equation (6.5), so the only remaining terms are and . They both involve the nonlinear mappings and , so they cannot be evaluated exactly.

The formulas allowing to approximate the distribution of the outputs of an MLP network are presented in Appendix B. As a result we get the posterior mean of the outputs and the posterior variance, decomposed as

With these results the remaining terms of the cost function are relatively easy to evaluate. The likelihood term is a standard Gaussian and yields

The source term is more difficult. The problematic expectation is

where we have used the additional approximation

Using Equation (6.31), the remaining term of the cost function can be written as