next up previous
Next: Natural and conjugate gradient Up: Natural gradient learning for Previous: Computing the Riemannian metric

Gaussian distribution

For the univariate Gaussian distribution parametrized by mean and variance $ N(x;\;\mu,v)$, we have

$\displaystyle \ln q(x \vert \mu,v) = - \frac{1}{2v}(x - \mu)^2 -\frac{1}{2}\ln(v)-\frac{1}{2}\ln(2\pi).$ (7)

Furthermore,

$\displaystyle E \left\{ -\frac{\partial^2 \ln q(x \vert \mu,v )}{\partial \mu \partial \mu} \right\}$ $\displaystyle = \frac{1}{v},$ (8)
$\displaystyle E \left\{ -\frac{\partial^2 \ln q(x \vert \mu,v)}{\partial v \partial \mu} \right\}$ $\displaystyle = 0, \mathrm{~and}$ (9)
$\displaystyle E \left\{ -\frac{\partial^2 \ln q(x \vert \mu,v)}{\partial v \partial v} \right\}$ $\displaystyle = \frac{1}{2 v^2}.$ (10)

The vanishing of the cross term between mean and variance further supports using the simpler fixed point rule (2) to update the variances.

In the case of univariate Gaussian distribution, natural gradient for the mean has a rather straightforward intuitive interpretation, which is illustrated in Figure 1 (left). Compared to conventional gradient, natural gradient compensates for the fact that changing the parameters of a Gaussian with small variance has much more pronounced effects than when the variance is large.

In case of multivariate Gaussian distribution, the elements of the Fisher information matrix corresponding to the mean are simply

$\displaystyle E \left\{ -\frac{\partial^2 \ln q(\mathbf{x}\vert \boldsymbol{\mu...
... \boldsymbol{\mu}^T \partial \boldsymbol{\mu}} \right\} = \mathbf{\Sigma}^{-1}.$ (11)

Typically the covariance matrix $ \mathbf{\Sigma}$ is assumed to have a simple structure (diagonal, diagonal+rank-$ k$, simple Markov random field) that makes working with it very efficient.


next up previous
Next: Natural and conjugate gradient Up: Natural gradient learning for Previous: Computing the Riemannian metric
Tapani Raiko 2007-09-11