Gaussian variables

Next: Addition and Multiplication Up: Building Blocks Previous: Building Blocks

Gaussian variables

A Gaussian variable has two inputs and and prior probability $p(s \vert m, v) = N(s; m, \exp(-v))$ . The variance is parametrised this way because then the mean and expected exponential of suffice for computing the cost function. It can be shown that when , and are mutually independent, i.e. , $C_{s,p} = -\left< \ln p(s \vert m, v) \right>$ yields

$\begin{multline} C_{s,p} = \frac{1}{2}\Big\{ \left< \exp v \right> \Big[\left(... ...}\left\{s\right\} \Big] - \left< v \right> + \ln 2\pi\Big\} \, . \end{multline}$

For observed variables this is the only term in the cost function but for latent variables there is also $C_{s,q}$ : the part resulting from $\left< \ln q(s) \right>$ . The posterior approximation

is defined to be Gaussian with mean $\overline{s}$ and variance $\widetilde{s}$ : $q(s) = N(s; \overline{s}, \widetilde{s})$ . This yields

$\displaystyle C_{s,q} = -\frac{1}{2} \ln 2\pi e \widetilde{s}$

(2)

which is the negative entropy of Gaussian variable with variance $\widetilde{s}$ . The parameters $\overline{s}$ and $\widetilde{s}$ are to be optimised during learning.

The output of a latent Gaussian node trivially provides expectation and variance: $\left< s \right> = \overline{s}$ and $\mathrm{Var}\left\{s\right\} = \widetilde{s}$ . The expected exponential can be shown to be $\left< \exp s \right> = \exp(\overline{s}+\widetilde{s}/2)$ . The outputs of observed nodes are scalar values instead of distributions and thus $\left< s \right> = s$ , $\mathrm{Var}\left\{s\right\} = 0$ and $\left< \exp s \right> = \exp s$ .

The posterior distribution of a latent Gaussian node can be updated as follows. 1) First, the gradients of w.r.t. $\left< s \right>$ , $\mathrm{Var}\left\{s\right\}$ and $\left< \exp s \right>$ are computed. 2) Second, the terms in which depend on $\overline{s}$ and $\widetilde{s}$ are assumed to be $b [(\overline{s}- a)^2 + \widetilde{s}] + c \left< \exp s \right>$ , where $\partial C_p / \partial \overline{s} = 2b (\overline{s} - a)$ , $\partial C_p / \partial \widetilde{s} = b$ and $\partial C / \partial \left< \exp s \right> = c$ . This assumption holds exactly if the output of the node is propagated to Gaussian nodes only and not to discrete nodes. If the output is used by a discrete node with a soft-max prior, this term gives an upper bound of as will be explained later. 3) Third, the minimum of is solved. This can be done analytically if , otherwise the minimum is obtained iteratively.

Next: Addition and Multiplication Up: Building Blocks Previous: Building Blocks

Harri Valpola 2001-10-01