next up previous
Next: Addition and Multiplication Up: Building Blocks Previous: Building Blocks

Gaussian variables

A Gaussian variable $ s$ has two inputs $ m$ and $ v$ and prior probability $ p(s \vert m, v) = N(s; m, \exp(-v))$. The variance is parametrised this way because then the mean and expected exponential of $ v$ suffice for computing the cost function. It can be shown that when $ s$, $ m$ and $ v$ are mutually independent, i.e. $ q(s, m, v) =
q(s)q(m)q(v)$, $ C_{s,p} = -\left< \ln p(s \vert m, v) \right>$ yields

\begin{multline}
C_{s,p} = \frac{1}{2}\Big\{ \left< \exp v \right>
\Big[\left(...
...}\left\{s\right\} \Big] -
\left< v \right> + \ln 2\pi\Big\} \, .
\end{multline}

For observed variables this is the only term in the cost function but for latent variables there is also $ C_{s,q}$: the part resulting from $ \left< \ln q(s) \right>$. The posterior approximation $ q(s)$ is defined to be Gaussian with mean $ \overline{s}$ and variance $ \widetilde{s}$: $ q(s) = N(s;
\overline{s}, \widetilde{s})$. This yields

$\displaystyle C_{s,q} = -\frac{1}{2} \ln 2\pi e \widetilde{s}$ (2)

which is the negative entropy of Gaussian variable with variance $ \widetilde{s}$. The parameters $ \overline{s}$ and $ \widetilde{s}$ are to be optimised during learning.

The output of a latent Gaussian node trivially provides expectation and variance: $ \left< s \right> = \overline{s}$ and $ \mathrm{Var}\left\{s\right\} = \widetilde{s}$. The expected exponential can be shown to be $ \left< \exp s \right> =
\exp(\overline{s}+\widetilde{s}/2)$. The outputs of observed nodes are scalar values instead of distributions and thus $ \left< s \right> = s$, $ \mathrm{Var}\left\{s\right\} = 0$ and $ \left< \exp s \right> = \exp s$.

The posterior distribution $ q(s)$ of a latent Gaussian node can be updated as follows. 1) First, the gradients of $ C_p$ w.r.t. $ \left< s \right>$, $ \mathrm{Var}\left\{s\right\}$ and $ \left< \exp s \right>$ are computed. 2) Second, the terms in $ C_p$ which depend on $ \overline{s}$ and $ \widetilde{s}$ are assumed to be $ b [(\overline{s}- a)^2 + \widetilde{s}] + c \left< \exp s \right>$, where $ \partial C_p / \partial \overline{s} = 2b (\overline{s} - a)$, $ \partial C_p /
\partial \widetilde{s} = b$ and $ \partial C / \partial \left< \exp s \right> = c$. This assumption holds exactly if the output of the node is propagated to Gaussian nodes only and not to discrete nodes. If the output is used by a discrete node with a soft-max prior, this term gives an upper bound of $ C_p$ as will be explained later. 3) Third, the minimum of $ C = C_p + C_q$ is solved. This can be done analytically if $ c = 0$, otherwise the minimum is obtained iteratively.


next up previous
Next: Addition and Multiplication Up: Building Blocks Previous: Building Blocks
Harri Valpola 2001-10-01