Finding the optimal approximation can be seen as an optimization problem, where the lower bound is maximized with respect to the variational parameters . This is often solved using a variational EM algorithm by updating sets of parameters alternatively while keeping the others fixed. Both VE and VM steps can implicitly optimally utilize the Riemannian structure of for conjugate exponential family models (Sato, 2001). Nevertheless, the EM based methods are prone to slow convergence, especially under low noise. A number of methods exist to speed up convergence of EM by more elaborate optimization schemes (McLachlan and Krishnan, 1996; Salakhutdinov et al., 2003) while retaining the alternating structure of E and M steps, but none of these has gained enough popularity to supplant the EM.
The formulation of VB as an optimization problem allows applying generic optimization algorithms to maximize , but this is rarely done in practice because the problems are quite high dimensional. Additionally many of the parameters are in different roles and the lack of this specific knowledge of the geometry of the problem can seriously hinder generic optimization tools.
There exist step lengthening methods that can be used to extend variational EM algorithms such as the pattern search method (Honkela et al., 2003) and adaptive overrelaxation (Salakhutdinov and Roweis, 2003). These methods are easy to implement as they require very little in addition to the EM algorithm. The downside of the methods is the relatively modest speedup, typically only by a small constant factor while the underlying, sometimes linear convergence behavior of variational EM is retained.