next up previous contents
Next: Form of the Cost Up: Gaussian Variable with Nonlinearity Previous: Nonlinearities

Update Rule

The update of a Gaussian node followed by the nonlinearity is similar to the plain Gaussian node. The source is updated to minimise the terms of the cost function defined in ([*]), that are affected. Other parts of the network are considered constant during the update.

In addition to the terms arising from the variable itself defined in ([*]) and ([*]), the terms corresponding to the variables that the output is propagated to are affected. The gradients of Cp w.r.t. $\left< f(s) \right>$ and $\mathrm{Var}\left\{f(s)\right\}$ are assumed to arise from a quadratic term $a \left< f(s) \right> + b [(\left< f(s) \right>-
\left< f(s) \right>_{\text{current}})^2 + \mathrm{Var}\left\{f(s)\right\}] + d$. This assumption is shown to be true in Section [*].

The update is done by repeating the following steps until they are shorter than some very small constant value.

1.
First, the cost function and the gradients of it w.r.t. $\overline{s}$ and $\widetilde{s}$ are computed.
2.
Second, update candidates for $\widetilde{s}$ and $\overline{s}$ are found using is a fixed point iteration and an approximate Newton's method accordingly.
3.
Third, the candidates are tested by computing the cost function again and the step size is halved as long as the cost function is about to increase.
The formulas and the proof that the cost decreases or it has converged, can be found in Appendix B.


next up previous contents
Next: Form of the Cost Up: Gaussian Variable with Nonlinearity Previous: Nonlinearities
Tapani Raiko
2001-12-10