As a specific example, let us study the nonlinear state-space model (NSSM) introduced in (Valpola and Karhunen, 2002). The model is specified by the generative model
Because of the nonlinearities the model is not in the conjugate
exponential family, and the standard VB learning methods are not
directly applicable. The bound (22) can nevertheless be
evaluated by linearizing the MLP networks
and
using the
technique of Honkela and Valpola (2005). This allows evaluating
the gradient with respect to
,
, and
and using a gradient based optimizer to adapt the
parameters. These variables are updated jointly rather than using an
EM-like split because the same heavy gradient computations are needed
for them all.
The natural gradient with respect to the parameters of
,
, and
was simplified by only using the gradient-based updates
for the mean elements.
For the parameters
the fully diagonal approximation for the inverse of the metric tensor
given by Eqs. (6) and (11)
was used. Since the parameters
and
had a diagonal covariance, no further
approximations were necessary.
Under these assumptions the natural gradient for the mean elements
is given by
Variances were updated separately using a fixed-point update rule as described
in (Valpola and Karhunen, 2002).
The
correlation parameters of
were updated
in EM style by assuming all other parameters fixed.
The remaining hyperparameters were
updated by VBEM.