next up previous
Next: Updating the posterior distribution Up: Gaussian node Previous: Gaussian node


Cost function

Recall now that we are approximating the joint posterior pdf of the random variables $ s$, $ m$, and $ v$ in a maximally factorial manner. It then decouples into the product of the individual distributions: $ q(s, m, v)$ = $ q(s)q(m)q(v)$. Hence $ s$, $ m$, and $ v$ are assumed to be statistically independent a posteriori. The posterior approximation $ q(s)$ of the Gaussian variable $ s$ is defined to be Gaussian with mean $ \overline{s}$ and variance $ \widetilde{s}$: $ q(s)$ = $ \mathcal N(s; \overline{s}, \widetilde{s})$. Utilising these, the part $ {\cal C}_p$ of the Kullback-Leibler cost function arising from the data, defined in Eq. (7), can be computed in closed form. For the Gaussian node of Figure 2, the cost becomes

$\displaystyle {\cal C}_{s,p}$ $\displaystyle = -\left< \ln p(s \vert m, v) \right>$    
  $\displaystyle = \frac{1}{2}\Big\{ \left< \exp v \right> \Big[\left(\overline{s}...
...{Var}\left\{m\right\} + \widetilde{s} \Big] - \left< v \right> + \ln 2\pi\Big\}$ (10)

The derivation is presented in Appendix B of Valpola02NC using slightly different notation. For the observed variables, this is the only term arising from them to the cost function $ {\cal C}_{\mathrm{KL}}$.

However, latent variables contribute to the cost function $ {\cal C}_{\mathrm{KL}}$ also with the part $ {\cal C}_q$ defined in Eq. (6), resulting from the expectation $ \left< \ln q(s) \right>$. This term is

$\displaystyle {\cal C}_{s,q} = \int_s q(s) \ln q(s)ds = -\frac{1}{2} [ \ln (2\pi\widetilde{s}) + 1]$ (11)

which is the negative entropy of Gaussian variable with variance $ \widetilde{s}$. The parameters defining the approximation $ q(s)$ of the posterior distribution of $ s$, namely its mean $ \overline{s}$ and variance $ \widetilde{s}$, are to be optimised during learning.

The output of a latent Gaussian node trivially provides the mean and the variance: $ \left< s \right> = \overline{s}$ and $ \mathrm{Var}\left\{s\right\} = \widetilde{s}$. The expected exponential can be easily shown to be Lappal-Miskin00,Valpola02NC

$\displaystyle \left< \exp s \right> = \exp( \overline{s} + \widetilde{s}/2)$ (12)

The outputs of the nodes corresponding to the observations are known scalar values instead of distributions. Therefore for these nodes $ \left< s \right> = s$, $ \mathrm{Var}\left\{s\right\} = 0$, and $ \left< \exp s \right> = \exp s$. An important conclusion of the considerations presented this far is that the cost function of a Gaussian node can be computed analytically in a closed form. This requires that the posterior approximation is Gaussian and that the mean $ \left < m \right >$ and the variance $ \mathrm{Var}\left\{m\right\}$ of the mean input $ m$ as well as the mean $ \left < v \right >$ and the expected exponential $ \left< \exp v \right>$ of the variance input $ v$ can be computed. To summarise, we have shown that Gaussian nodes can be connected together and their costs can be evaluated analytically.

We will later on use derivatives of the cost function with respect to some expectations of its mean and variance parents $ m$ and $ v$ as messages from children to parents. They are derived directly from Eq. (10), taking the form

$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< m \right>}$ $\displaystyle = \left< \exp v \right>(\left< m \right>-\overline{s})$ (13)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \mathrm{Var}\left\{m\right\}}$ $\displaystyle = \frac{\left< \exp v \right>}{2}$ (14)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< v \right>}$ $\displaystyle = -\frac{1}{2}$ (15)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< \exp v \right>}$ $\displaystyle = \frac{(\overline{s}-\left< m \right>)^2 +\mathrm{Var}\left\{m\right\}+\widetilde{s}}{2}.$ (16)


next up previous
Next: Updating the posterior distribution Up: Gaussian node Previous: Gaussian node
Tapani Raiko 2006-08-28