Addition and multiplication nodes

Consider first the addition node. The mean, variance and expected exponential of the output of the addition node can be evaluated in a straightforward way. Assuming that the inputs

are statistically independent, these expectations are respectively given by

$\displaystyle \left< \sum_{i=1}^n s_i \right>$	$\displaystyle = \sum_{i=1}^n \left< s_i \right>$	(20)
$\displaystyle \mathrm{Var}\left\{\sum_{i=1}^n s_i\right\}$	$\displaystyle = \sum_{i=1}^n \mathrm{Var}\left\{s_i\right\}$	(21)
$\displaystyle \left< \exp \left( \sum_{i=1}^n s_i \right) \right>$	$\displaystyle = \prod_{i=1}^n \left< \exp s_i \right>$	(22)

Consider then the multiplication node. Assuming independence between the inputs

, the mean and the variance of the output take the form (see Appendix B.1)

$\displaystyle \left< \prod_{i=1}^n s_i \right>$	$\displaystyle = \prod_{i=1}^n \left< s_i \right>$	(23)
$\displaystyle \mathrm{Var}\left\{\prod_{i=1}^n s_i\right\}$	$\displaystyle = \prod_{i=1}^n \left[ \left< s_i \right>^2 + \mathrm{Var}\left\{s_i\right\} \right] - \prod_{i=1}^n \left< s_i \right>^2$	(24)

The formulas (20)-(24) are given for

inputs because of generality, but in practice we have carried out the needed calculations pairwise. When using the general formula (24), the variance might otherwise occasionally take a small negative value due to minor imprecisions appearing in the computations. This problem does not arise in pairwise computations. Now, the propagation in the forward direction is covered.

The form of the cost function propagating from children to parents is assumed to be of the form (17). This is true even in the case, where there are addition and multiplication nodes in between (see Appendix B.2 for proof). Therefore only the gradients of the cost function with respect to the different expectations need to be propagated backwards to identify the whole cost function w.r.t. the parent. The required formulas are obtained in a straightforward manner from Eqs. (20)-(24). The gradients for the addition node are:

$\displaystyle \frac{\partial C}{\partial \left< s_1 \right>} = \frac{\partial C}{\partial \left< s_1 + s_2 \right>}$

(25)

$\displaystyle \frac{\partial C}{\partial \mathrm{Var}\left\{s_1\right\}} = \frac{\partial C}{\partial \mathrm{Var}\left\{s_1 + s_2\right\}}$

(26)

$\displaystyle \frac{\partial C}{\partial \left< \exp s_1 \right>} = \left< \exp s_2 \right>\frac{\partial C}{\partial \left< \exp (s_1+s_2) \right>}.$

(27)

$\displaystyle \frac{\partial C}{\partial \left< s_1 \right>}$	$\displaystyle = \left< s_2 \right> \frac{\partial C}{\partial \left< s_1 s_2 \r... ...rac{\partial C}{\partial \mathrm{Var}\left\{s_1 s_2\right\}} \left< s_1 \right>$	(28)
$\displaystyle \frac{\partial C}{\partial \mathrm{Var}\left\{s_1\right\}}$	$\displaystyle = \left(\left< s_2 \right>^2 + \mathrm{Var}\left\{s_2\right\}\right)\frac{\partial C}{\partial \mathrm{Var}\left\{s_1 s_2\right\}}.$	(29)