The update of a Gaussian node followed by the nonlinearity is similar to the plain Gaussian node. The source is updated to minimise the terms of the cost function defined in (), that are affected. Other parts of the network are considered constant during the update.
In addition to the terms arising from the variable itself defined in () and (), the terms corresponding to the variables that the output is propagated to are affected. The gradients of Cp w.r.t. and are assumed to arise from a quadratic term . This assumption is shown to be true in Section .
The update is done by repeating the following steps until they are shorter than some very small constant value.