A simple example illustrates the long definition of HNFA+VM. Even though the learning algorithm is not yet presented, the resulting model taught with two dimensional toy data is discussed here. As is typical for nonlinear problems, the algorithm can only find a local optimum that depends on the initialisation. A local optimum of a good model is usually better than a global optimum of a bad one. In this case, the found network is not optimal but perhaps instructive. It is explained from bottom up.
HNFA+VM with three layers is taught with the two-dimensional
data shown by dots in Figure . The data points
correspond to different time indices t. The second layer, like the
first one, has two dimensions: the first corresponds to the direction
left and the second to the up right from the bias
marked with a circle. The single neuron in the third layer affects
the neurons on the second layer and through them the data
reconstructions. Possible values of the source s3,1 form a curve
in the data space.
![]() |
Figure illustrates the effects of the neuron in
the third layer in more detail. At one end of its spectrum, it
activates the first neuron in the second layer and at the other end,
the second neuron. In the middle, both neurons are affected and the
curvature of the data, is modelled. Most of the weights
B to the variance neurons are close to zero, but the
connection B1,2,1 from the first dimension of the second layer to
the vertical dimension of the data plane is very different. The effect
can be seen in the lower part of the same figure.
![]() |
Each data point is connected to the mean of its reconstruction
ms1 in Figure . The reconstructions
are near the actual data points, except in the far left.
In that region, the vertical variance neuron is activated. This
explains why the reconstructions are allowed to be vertically
inaccurate there.
![]() |
The third layer has one source s3,1 and the corresponding variance neuron u3,1. The prior sdistribution of the source s3,1 is depicted in
Figure .
The
distribution is very close to a Gaussian, since the corresponding
variance neuron happens to be inactive. Otherwise it would have been
symmetric but super-Gaussian as explained in
Section
. The nonlinearity distorts the distribution
such that the other end has a longer tail.
![]() |
The behaviour of the model can be compared to that of a multilayer
perceptron (MLP) network. In the MLP network, the sources of the
second layer would have been computational hidden units and therefore
their signals would have been functions of the third layer signal. In
the HNFA+VM case the difference of input and output signals of the
middle layer can be seen in Figure .
The values are somewhat close to the diagonal line which means that
the second layer behaves somewhat like a computational layer. The
variation from the diagonal corresponds to the fact that the
reconstructions in Figure
are not exactly on
the curve. The variance neurons have no counterparts in the MLP
framework.
![]() |
With another initialisation the presumably globally optimal solution
shown in Figure , was found. A third source in the
second layer took care of the left part and all the reconstructions
became relatively accurate. Variance neurons were of no use in this
solution. The existence of local optimum in a simple problem like this
suggests that it is far from trivial to design a good learning
procedure and therefore the entire Chapter
is dedicated
to it.