As a specific example, let us study the nonlinear state-space model (NSSM) introduced in (Valpola and Karhunen, 2002). The model is specified by the generative model
Because of the nonlinearities the model is not in the conjugate exponential family, and the standard VB learning methods are not directly applicable. The bound (22) can nevertheless be evaluated by linearizing the MLP networks and using the technique of Honkela and Valpola (2005). This allows evaluating the gradient with respect to , , and and using a gradient based optimizer to adapt the parameters. These variables are updated jointly rather than using an EM-like split because the same heavy gradient computations are needed for them all.
The natural gradient with respect to the parameters of , , and was simplified by only using the gradient-based updates for the mean elements. For the parameters the fully diagonal approximation for the inverse of the metric tensor given by Eqs. (6) and (11) was used. Since the parameters and had a diagonal covariance, no further approximations were necessary. Under these assumptions the natural gradient for the mean elements is given by
Variances were updated separately using a fixed-point update rule as described in (Valpola and Karhunen, 2002). The correlation parameters of were updated in EM style by assuming all other parameters fixed. The remaining hyperparameters were updated by VBEM.