Cost function

Next: Updating the posterior distribution Up: Gaussian node Previous: Gaussian node

Cost function

Recall now that we are approximating the joint posterior pdf of the random variables

, and

in a maximally factorial manner. It then decouples into the product of the individual distributions:

. Hence

, and

are assumed to be statistically independent a posteriori. The posterior approximation

of the Gaussian variable

is defined to be Gaussian with mean $\overline{s}$ and variance $\widetilde{s}$ :

= $\mathcal N(s; \overline{s}, \widetilde{s})$ . Utilising these, the part ${\cal C}_p$ of the Kullback-Leibler cost function arising from the data, defined in Eq. (7), can be computed in closed form. For the Gaussian node of Figure 2, the cost becomes

$\displaystyle {\cal C}_{s,p}$	$\displaystyle = -\left< \ln p(s \vert m, v) \right>$
	$\displaystyle = \frac{1}{2}\Big\{ \left< \exp v \right> \Big[\left(\overline{s}... ...{Var}\left\{m\right\} + \widetilde{s} \Big] - \left< v \right> + \ln 2\pi\Big\}$	(10)

The derivation is presented in Appendix B of Valpola02NC using slightly different notation. For the observed variables, this is the only term arising from them to the cost function ${\cal C}_{\mathrm{KL}}$ .

However, latent variables contribute to the cost function ${\cal C}_{\mathrm{KL}}$ also with the part ${\cal C}_q$ defined in Eq. (6), resulting from the expectation $\left< \ln q(s) \right>$ . This term is

$\displaystyle {\cal C}_{s,q} = \int_s q(s) \ln q(s)ds = -\frac{1}{2} [ \ln (2\pi\widetilde{s}) + 1]$

(11)

which is the negative entropy of Gaussian variable with variance $\widetilde{s}$ . The parameters defining the approximation

of the posterior distribution of

, namely its mean $\overline{s}$ and variance $\widetilde{s}$ , are to be optimised during learning.

The output of a latent Gaussian node trivially provides the mean and the variance: $\left< s \right> = \overline{s}$ and $\mathrm{Var}\left\{s\right\} = \widetilde{s}$ . The expected exponential can be easily shown to be Lappal-Miskin00,Valpola02NC

$\displaystyle \left< \exp s \right> = \exp( \overline{s} + \widetilde{s}/2)$

(12)

The outputs of the nodes corresponding to the observations are known scalar values instead of distributions. Therefore for these nodes $\left< s \right> = s$ , $\mathrm{Var}\left\{s\right\} = 0$ , and $\left< \exp s \right> = \exp s$ . An important conclusion of the considerations presented this far is that the cost function of a Gaussian node can be computed analytically in a closed form. This requires that the posterior approximation is Gaussian and that the mean $\left < m \right >$ and the variance $\mathrm{Var}\left\{m\right\}$ of the mean input

as well as the mean $\left < v \right >$ and the expected exponential $\left< \exp v \right>$ of the variance input

can be computed. To summarise, we have shown that Gaussian nodes can be connected together and their costs can be evaluated analytically.

We will later on use derivatives of the cost function with respect to some expectations of its mean and variance parents and as messages from children to parents. They are derived directly from Eq. (10), taking the form

$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< m \right>}$	$\displaystyle = \left< \exp v \right>(\left< m \right>-\overline{s})$	(13)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \mathrm{Var}\left\{m\right\}}$	$\displaystyle = \frac{\left< \exp v \right>}{2}$	(14)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< v \right>}$	$\displaystyle = -\frac{1}{2}$	(15)
$\displaystyle \frac{\partial{\cal C}_{s,p}}{\partial \left< \exp v \right>}$	$\displaystyle = \frac{(\overline{s}-\left< m \right>)^2 +\mathrm{Var}\left\{m\right\}+\widetilde{s}}{2}.$	(16)

Next: Updating the posterior distribution Up: Gaussian node Previous: Gaussian node

Tapani Raiko 2006-08-28