The update rule for the nonlinear node is considered in
Section . It is a minimisation problem for the cost
function
Cf=Cf,p+Cf,q. Here is a proof that each iteration
decreases the cost function (or keeps it the same if the partial
derivatives vanish).
A different notation is used here.
The quadratic term
is taken into account by first finding an optimal expected value of
f marked with
and the variance
that
corresponds to how much difference will cost.
![]() |
= | (2b)-1 | (11.1) |
![]() |
= | ![]() |
(11.2) |
![]() |
(11.3) | ||
![]() |
(11.4) |
Now all the affected terms of C can be written as
![]() |
= | ![]() |
(11.7) |
![]() |
= | ![]() |
(11.8) |
Now we have to find
and
such that
Cf=Cf,p+Cf,q is minimized. The partial derivates of Cfare
It is worthwhile to note the connection derived from ()
and (
):
For
,
a gradient descent is used with the step length
approximated from Newton's method. The approximation composed from
(
)
As was shown, these steps guarantee a direction, in which the cost function decreases locally. If the derivatives are zero, the iterating has converged. To guarantee also that the cost function does not increase because of a too long step, the update candidates are verified. The step is halved as long as the cost is about to rise.