As before, the general cost function of ensemble learning, as given in Equation (3.11), is
In the NSSM, all the probability distributions involved are Gaussian so most of the terms will resemble the corresponding ones of the CDHMM. For the parameters ,
The term is a little more complicated:
The first term reduces to Equation (6.26) but the second term is a little different:
The expectation of has been evaluated in Equation (6.5), so the only remaining terms are and . They both involve the nonlinear mappings and , so they cannot be evaluated exactly.
The formulas allowing to approximate the distribution of the outputs of an MLP network are presented in Appendix B. As a result we get the posterior mean of the outputs and the posterior variance, decomposed as
With these results the remaining terms of the cost function are relatively easy to evaluate. The likelihood term is a standard Gaussian and yields
The source term is more difficult. The problematic expectation is
Using Equation (6.31), the remaining term of the cost function can be written as