next up previous
Next: Cost function Up: Variational Bayesian inference in Previous: Variational Bayesian inference in


Gaussian node

Recall the Gaussian node in Section 3.1. The variance is parameterised using the exponential function as $ \exp(-v)$. This is because then the mean $ \left < v \right >$ and expected exponential $ \left< \exp v \right>$ of the input $ v$ suffice for evaluating the cost function, as will be shown shortly. Consequently the cost function can be minimised using the gradients with respect to these expectations. The gradients are computed backwards from the children nodes, but otherwise our learning method differs clearly from standard back-propagation Haykin98.

Another important reason for using the parameterisation $ \exp(-v)$ for the prior variance of a Gaussian random variable $ s$ is that the posterior distribution of $ s$ then becomes approximately Gaussian, provided that the prior mean $ m$ of $ s$ is Gaussian, too (see for example Section 7.1 or Lappal-Miskin00). The conjugate prior distribution of the inverse of the prior variance of a Gaussian random variable is the gamma distribution Gelman95. Using such gamma prior pdf causes the posterior distribution to be gamma, too, which is mathematically convenient. However, the conjugate prior pdf of the second parameter of the gamma distribution is something quite intractable. Hence gamma distribution is not suitable for developing hierarchical variance models. The logarithm of a gamma distributed variable is approximately Gaussian distributed Gelman95, justifying the adopted parameterisation $ \exp(-v)$. However, it should be noted that both the gamma and $ \exp(-v)$ distributions are used as prior pdfs mainly because they make the estimation of the posterior pdf mathematically tractable Lappal-Miskin00; one cannot claim that either of these choices would be correct.



Subsections
next up previous
Next: Cost function Up: Variational Bayesian inference in Previous: Variational Bayesian inference in
Tapani Raiko 2006-08-28