next up previous
Next: Computing the Riemannian metric Up: Natural Conjugate Gradient in Previous: Variational Bayes

Natural gradient learning for VB

Let $ \mathcal{F}(\boldsymbol{\xi})$ be a scalar function defined on the manifold $ S=\{ \boldsymbol{\xi}\in \mathbf{R}^n \}$. If $ S$ is a Euclidean space and the coordinate system $ \boldsymbol{\xi}$ is orthonormal, the direction of steepest ascent is given by the standard gradient $ \nabla \mathcal{F}(\boldsymbol{\xi})$.

If the space $ S$ is a curved Riemannian manifold, the direction of steepest ascent is given by the natural gradient [9]

$\displaystyle \tilde{\nabla} \mathcal{F}(\boldsymbol{\xi}) = \mathbf{G}^{-1}(\boldsymbol{\xi}) \nabla \mathcal{F}(\boldsymbol{\xi}).$ (3)

The $ n \times n$ matrix $ \mathbf{G}(\boldsymbol{\xi})=(g_{ij}(\boldsymbol{\xi}))$ is called the Riemannian metric tensor and it may depend on the point of origin $ \boldsymbol{\xi}$.

For the space of probability distributions $ q(\boldsymbol{\theta}\vert \boldsymbol{\xi})$, the most common Riemannian metric tensor is given by the Fisher information [8]

$\displaystyle I_{ij}(\boldsymbol{\xi}) = g_{ij}(\boldsymbol{\xi}) = E \left\{ \...
...ymbol{\theta}\vert \boldsymbol{\xi})} {\partial \xi_i \partial \xi_j} \right\},$ (4)

where the last equality is valid given certain regularity conditions [11].



Subsections

Tapani Raiko 2007-09-11