Simple Example

A simple example illustrates the long definition of HNFA+VM. Even though the learning algorithm is not yet presented, the resulting model taught with two dimensional toy data is discussed here. As is typical for nonlinear problems, the algorithm can only find a local optimum that depends on the initialisation. A local optimum of a good model is usually better than a global optimum of a bad one. In this case, the found network is not optimal but perhaps instructive. It is explained from bottom up.

HNFA+VM with three layers is taught with the two-dimensional data shown by dots in Figure

. The data points correspond to different time indices t. The second layer, like the first one, has two dimensions: the first corresponds to the direction left and the second to the up right from the bias $\boldsymbol{\mu}_{s,1}$ marked with a circle. The single neuron in the third layer affects the neurons on the second layer and through them the data reconstructions. Possible values of the source s_3,1 form a curve in the data space.

**Figure:** The data points s₁(t) of the simple example are marked with dots. The curve corresponding to different signals s_3,1 is shown. The bias $\boldsymbol{\mu}_{s,1}$ is marked with a circle.
$\begin{figure} \begin{center} \epsfig{file=pics/mlp_h1a.eps,width=0.76\textwidth} \end{center} \end{figure}$

Figure

illustrates the effects of the neuron in the third layer in more detail. At one end of its spectrum, it activates the first neuron in the second layer and at the other end, the second neuron. In the middle, both neurons are affected and the curvature of the data, is modelled. Most of the weights B to the variance neurons are close to zero, but the connection B_1,2,1 from the first dimension of the second layer to the vertical dimension of the data plane is very different. The effect can be seen in the lower part of the same figure.

**Figure:** The effects of the neuron in the third layer. Top: The signals f(s_2,1) (solid line) and f(s_2,2) (dash line) of the neurons in the second layer as the function of the signal f(s_3,1) of the neuron in the third layer. Bottom: The neuron affects also the variance neurons in the first layer. The vertical variance signal u_1,1 (solid line) and the horizontal variance signal u_1,2 (dash line) are plotted against f(s_3,1).
$\begin{figure} \begin{tabular}{cc} \epsfig{file=pics/mlp_h2a.eps,width=0.55\te... ...\ \\ \\ \phantom{o} \\ \end{tabular} \vspace{-15mm} \end{tabular} \end{figure}$

Each data point is connected to the mean of its reconstruction m^s₁ in Figure

. The reconstructions are near the actual data points, except in the far left. In that region, the vertical variance neuron is activated. This explains why the reconstructions are allowed to be vertically inaccurate there.

**Figure:** The data points s₁(t) marked with dots are connected to their reconstructions m^s₁(t) with a line. The bias $\boldsymbol{\mu}_{s,1}$ is marked with a circle.
$\begin{figure} \begin{center} \epsfig{file=pics/sreco.eps,width=0.76\textwidth} \end{center} \end{figure}$

The third layer has one source s_3,1 and the corresponding variance neuron u_3,1. The prior sdistribution of the source s_3,1 is depicted in Figure

. The distribution is very close to a Gaussian, since the corresponding variance neuron happens to be inactive. Otherwise it would have been symmetric but super-Gaussian as explained in Section

. The nonlinearity distorts the distribution such that the other end has a longer tail.

**Figure 5.5:** Left: The prior distribution of the neuron on the third layer. Right: The same distribution after the nonlinearity.
$\begin{figure} \begin{tabular}{cc} $s_{3,1}(t)$\space & $f(s_{3,1}(t))$\space ... ... \epsfig{file=pics/sf3.eps,width=0.46\textwidth}\\ \end{tabular} \end{figure}$

The behaviour of the model can be compared to that of a multilayer perceptron (MLP) network. In the MLP network, the sources of the second layer would have been computational hidden units and therefore their signals would have been functions of the third layer signal. In the HNFA+VM case the difference of input and output signals of the middle layer can be seen in Figure

. The values are somewhat close to the diagonal line which means that the second layer behaves somewhat like a computational layer. The variation from the diagonal corresponds to the fact that the reconstructions in Figure

are not exactly on the curve. The variance neurons have no counterparts in the MLP framework.

**Figure:** The posterior signals of source s₂ are plotted against their prior signals m^s₂ given by the upper layer. If the plots were exactly diagonal, the upper layer would define the signals in the second layer precisely.
$\begin{figure} \begin{tabular}{cc} $s_{2,1}$\space against $m^s_{2,1}$\space &... ...psfig{file=pics/mlp_h3b.eps,width=0.46\textwidth}\\ \end{tabular} \end{figure}$

With another initialisation the presumably globally optimal solution shown in Figure

, was found. A third source in the second layer took care of the left part and all the reconstructions became relatively accurate. Variance neurons were of no use in this solution. The existence of local optimum in a simple problem like this suggests that it is far from trivial to design a good learning procedure and therefore the entire Chapter

is dedicated to it.

**Figure:** The presumably global optimum is visualised as in Figure .
$\begin{figure} \begin{center} \epsfig{file=pics/mlp_ho1.eps,width=0.6\textwidth}\vspace{-5mm} \end{center} \end{figure}$