An alternative approach to unsupervised learning of sparse codes is
the minimisation of mutual predictability of the outputs of the
neurons. In such a scheme, for each neuron there is an associated
predictor which tries to predict the output of the neuron on the basis
of the outputs of the other neurons [Schmidhuber, 1992]. The
neurons try to learn a mapping and the
predictors try to learn the mappings
. The error function for the network
is based on the prediction errors
. The predictors try to minimise the prediction error
and the neurons try to escape the prediction, that is, to maximise the
same error criterion. This forces the neurons to represent
independent information about the inputs. If a neuron cannot escape
the prediction, it is not allowed to be active. When the number of
neurons in the network exceeds the number of independent features in
the input, this scheme yields a sparse code. Since the neurons try to
maximise the error function, care has to be taken to ensure that the
outputs are bounded, otherwise it might be possible that the outputs
would tend to infinity. In all the algorithms proposed so far, the
outputs have been explicitly restricted to a finite range.
One of the first algorithms based on mutual predictability minimisation was published by Földiák (1990). In his model, each neuron computes a weighted sum of its inputs, subtracts the prediction made on the basis of activities of other neurons, and subtracts an individual threshold. The activity of each neuron is then obtained by applying a squashing function. The activity of one neuron depends on the activity of the other neurons, which in turn depend on the activity of the first neuron. Therefore the activities have to be solved iteratively. When a new input is presented to the network, initially all the activities are set to zero. Then the system is iterated until the activities have settled. The final outputs are obtained by rounding the activities to binary values. Sparsity is ensured by introducing a target average activity ratio for the neurons. The individual thresholds are adapted during learning so that the target activity ratios are reached.
Schmidhuber (1992) was probably the first to
propose a general framework for mutual predictability minimisation.
As an example he used multilayer perceptron (MLP) networks and
back-propagation algorithm to teach the neurons and their corresponding
predictors. The outputs of the neurons were bounded between zero
and one, but due to maximisation of the error criterion the outputs
tend to binary values. The strength of the proposed method is its
ability to find complex nonlinear dependencies from the data.
Moreover, this algorithm does not use an iterative procedure to find
the activities of the neurons, but relies on the ability of the
nonlinear MLPs to find the correct outputs. However, learning is very
slow and the structure is biologically implausible. The binary
outputs can also be a disadvantage in some applications. It might be
possible to make modifications that allow neurons with graded outputs,
but this does not seem straightforward.
Sirosh (1995), and Sirosh and Miikkulainen (1994) have developed a biologically motivated model of cortex which proposes two kinds of lateral connections between neurons. Short range lateral connections are supposed to be excitatory, and long range connections inhibitory. Both types of connections are being adapted during learning. The resulting algorithm is able to model many structures similar to those in the primary visual cortex, such as receptive fields, topographic maps, ocular dominance, orientation and size preference columns, and patterned lateral connections between neurons. While the model gives very interesting results, it is computationally quite demanding. This precludes large-scale simulations which would be important for understanding the massively parallel processing in the cortex.