The preliminary winning strengths were defined by
equation 3.7
We shall rewrite this
where and
. This equation has to be solved numerically, since
appear in both sides of the equation, and it is impossible to
find the solution in a closed form. The computation only needs to be
done for the neurons in the set of winners, and therefore n equals
the number of winners.
Newton's method is often a good choice when one has to numerically
solve equations. If we denote , then solving equation 4.1 is
equivalent to finding the root of
. Finding the root using
Newton's method requires the computation of the inverse of the Jacobian
matrix of
, however. It is a n by n matrix, because
both vectors
and
have n elements. In some
cases it is possible to find an efficient way to compute the inverse
of the Jacobian. This is possible, if the Jacobian is for example close
to diagonal matrix. In our case the Jacobian is not at all close to
diagonal matrix, and it does not seem to have any other properties
which would allow us to compute the inverse efficiently.
We are going to solve the equation 4.1 by using another
popular method: fixed point iteration. If we have an equation
of the form x = f(x) we can try to find the solution simply by
iterating x(t+1) = f(x(t)). This is the standard fixed point
iteration. Suppose is a solution satisfying
.
The iteration converges to
only if
. Notice that
if the derivative f' is negative, the value x(t) oscillates around
the solution
, and if the derivative is positive, the value
x(t) changes monotonically. The closer the derivative is to zero
the faster is the convergence.
It is also possible to speed up the convergence by modifying the
iteration. Aitken extrapolation formula [Atkinson, 1989, page 86,]
makes the assumption that the values x(t) form a geometric series if
we use the standard fixed point iteration. This assumption is usually
valid at least when x(t) are close to the solution and the
derivative f'(x(t)) is approximately constant. We shall start from
x(t) and compute the values x(t+1) and x(t+2) by using the
standard fixed point iteration. Then we shall assume that x(t),
x(t+1) and x(t+2) are consecutive values in a geometric series
which converges to
, and solve an estimate
. We can
then assign
and iterate the process.
If we know that f'(x) < 0 for all x, it is also possible to derive
an alternative modification which can speed up the convergence. Let
us write , and consider
the iteration process
. For a proper
choice of
the derivative
, and the convergence
is at least quadratic. Even if
, by choosing
sufficiently small the iteration can be made to converge
(see figure 4.1). We shall find a way to adapt the
so that the derivative of
with respect to
x will become as close to zero as possible.
If is too large, the derivative is less than zero and x(t)
oscillates. This will manifest itself as an alternating sign of the
differences of consecutive values of x(t). If
is too
small, the derivative is greater than zero and x(t) changes
monotonically. This too can be detected from the sign of the
differences, which stays unchanged. We can now formulate a rule for
adapting the
during iteration. After each iteration of
x(t) we shall calculate the sign of x(t)-x(t-1). If it has
changed after last iteration, we shall decrease
, otherwise
we shall increase it. The use of
speeds up convergence and
allows convergence even if
.
In our case the equation 4.1 is a system of equations.
We can still use the Aitken extrapolation formula or the adaptation
rule for proposed above, but this time we have to apply the
modifications to each of the functions
separately. Both of the
modified methods and the standard fixed point iteration were tested in
simulations. The iterated function was like that in
equation 4.1. There were ten neurons, and the values
were sampled from an even distribution between -4 and 0.
The closer the values were to zero the better the convergence was.
The distribution used in this test can be thought of as a sort of
worst case, because normally the values are not this small.
Figure 4.2 shows a typical test run. The standard
method did not converge at all. The iteration using Aitken
extrapolation formula usually converged, but much more slowly than the
proposed method, which converged in all tests. The good performance
of the proposed method is probably due to the fact that all partial
derivatives
since
. Therefore the value of each of the variables
typically
oscillate in the standard fixed point iteration, and the use of
adaptive
can efficiently dampen the oscillation.
Figure 4.1:
In the left, fixed point iteration does not converge to the
solution , because
. The derivative is
negative, and therefore the derivative of a new function
can be made zero by a
proper choose of
. In the right,
,
and
.
Figure 4.2:
The proposed modified fixed point iteration method using adaptive
(solid line) has been compared with the standard method
(dotted line) and Aitken extrapolation formula (dashed line). The
proposed method has converged much more rapidly than the iteration
using Aitken extrapolation formula, while the standard method has
not converged at all.