Gaussian node

Next: Cost function Up: Variational Bayesian inference in Previous: Variational Bayesian inference in

Gaussian node

Recall the Gaussian node in Section 3.1. The variance is parameterised using the exponential function as $\exp(-v)$ . This is because then the mean $\left < v \right >$ and expected exponential $\left< \exp v \right>$ of the input suffice for evaluating the cost function, as will be shown shortly. Consequently the cost function can be minimised using the gradients with respect to these expectations. The gradients are computed backwards from the children nodes, but otherwise our learning method differs clearly from standard back-propagation Haykin98.

Another important reason for using the parameterisation $\exp(-v)$ for the prior variance of a Gaussian random variable is that the posterior distribution of then becomes approximately Gaussian, provided that the prior mean of is Gaussian, too (see for example Section 7.1 or Lappal-Miskin00). The conjugate prior distribution of the inverse of the prior variance of a Gaussian random variable is the gamma distribution Gelman95. Using such gamma prior pdf causes the posterior distribution to be gamma, too, which is mathematically convenient. However, the conjugate prior pdf of the second parameter of the gamma distribution is something quite intractable. Hence gamma distribution is not suitable for developing hierarchical variance models. The logarithm of a gamma distributed variable is approximately Gaussian distributed Gelman95, justifying the adopted parameterisation $\exp(-v)$ . However, it should be noted that both the gamma and $\exp(-v)$ distributions are used as prior pdfs mainly because they make the estimation of the posterior pdf mathematically tractable Lappal-Miskin00; one cannot claim that either of these choices would be correct.

Subsections

Next: Cost function Up: Variational Bayesian inference in Previous: Variational Bayesian inference in

Tapani Raiko 2006-08-28