The most difficult part in optimising the cost function for the NSSM is updating the hidden states and the weights of the MLP networks. All the hyperparameters can be handled in exactly the same way as in the CDHMM case presented in Section 6.1.2 but ignoring the additional weights caused by the HMM state probabilities.
Updating the states and the weights is carried out in two steps. First the value of the cost function is evaluated using the current estimates for all the variables. This is called forward computation because it consists of a forward pass through the MLP networks.
The second, backward computation step consists of evaluating the
partial derivatives of the part of the cost function with
respect to the different parameters. This can be done by moving
backward in the network, starting from the outputs and proceeding
toward the inputs. The standard back-propagation calculations are
done in the same way. In our case, however, all the parameters are
described by their own posterior distributions, which are
characterised by their means and variances. The cost function is very
different from the standard back-propagation and the learning is
unsupervised. This means that all the calculation formulas are
different.