Finding the optimal approximation can be seen as an optimization
problem, where the lower bound
is maximized with
respect to the variational parameters
. This is often solved
using a variational EM algorithm by updating sets of parameters
alternatively while keeping the others fixed. Both VE and VM steps
can implicitly optimally utilize the Riemannian structure of
for conjugate exponential family models (Sato, 2001).
Nevertheless, the EM based methods are prone
to slow convergence, especially under low noise. A number of methods
exist to speed up convergence of EM by more elaborate optimization
schemes (McLachlan and Krishnan, 1996; Salakhutdinov et al., 2003) while retaining the
alternating structure of E and M steps, but none of these has gained
enough popularity to supplant the EM.
The formulation of VB as an optimization
problem allows applying
generic optimization algorithms to maximize
, but this
is rarely done in practice because the problems are quite high
dimensional. Additionally many of the parameters are in different
roles and the lack of this specific knowledge of the geometry of the
problem can seriously hinder generic optimization tools.
There exist step lengthening methods that can be used to extend variational EM algorithms such as the pattern search method (Honkela et al., 2003) and adaptive overrelaxation (Salakhutdinov and Roweis, 2003). These methods are easy to implement as they require very little in addition to the EM algorithm. The downside of the methods is the relatively modest speedup, typically only by a small constant factor while the underlying, sometimes linear convergence behavior of variational EM is retained.