An alternative approach to unsupervised learning of sparse codes is the minimisation of mutual predictability of the outputs of the neurons. In such a scheme, for each neuron there is an associated predictor which tries to predict the output of the neuron on the basis of the outputs of the other neurons [Schmidhuber, 1992]. The neurons try to learn a mapping and the predictors try to learn the mappings . The error function for the network is based on the prediction errors . The predictors try to minimise the prediction error and the neurons try to escape the prediction, that is, to maximise the same error criterion. This forces the neurons to represent independent information about the inputs. If a neuron cannot escape the prediction, it is not allowed to be active. When the number of neurons in the network exceeds the number of independent features in the input, this scheme yields a sparse code. Since the neurons try to maximise the error function, care has to be taken to ensure that the outputs are bounded, otherwise it might be possible that the outputs would tend to infinity. In all the algorithms proposed so far, the outputs have been explicitly restricted to a finite range.
One of the first algorithms based on mutual predictability minimisation was published by Földiák (1990). In his model, each neuron computes a weighted sum of its inputs, subtracts the prediction made on the basis of activities of other neurons, and subtracts an individual threshold. The activity of each neuron is then obtained by applying a squashing function. The activity of one neuron depends on the activity of the other neurons, which in turn depend on the activity of the first neuron. Therefore the activities have to be solved iteratively. When a new input is presented to the network, initially all the activities are set to zero. Then the system is iterated until the activities have settled. The final outputs are obtained by rounding the activities to binary values. Sparsity is ensured by introducing a target average activity ratio for the neurons. The individual thresholds are adapted during learning so that the target activity ratios are reached.
Schmidhuber (1992) was probably the first to propose a general framework for mutual predictability minimisation. As an example he used multilayer perceptron (MLP) networks and back-propagation algorithm to teach the neurons and their corresponding predictors. The outputs of the neurons were bounded between zero and one, but due to maximisation of the error criterion the outputs tend to binary values. The strength of the proposed method is its ability to find complex nonlinear dependencies from the data. Moreover, this algorithm does not use an iterative procedure to find the activities of the neurons, but relies on the ability of the nonlinear MLPs to find the correct outputs. However, learning is very slow and the structure is biologically implausible. The binary outputs can also be a disadvantage in some applications. It might be possible to make modifications that allow neurons with graded outputs, but this does not seem straightforward.
Sirosh (1995), and Sirosh and Miikkulainen (1994) have developed a biologically motivated model of cortex which proposes two kinds of lateral connections between neurons. Short range lateral connections are supposed to be excitatory, and long range connections inhibitory. Both types of connections are being adapted during learning. The resulting algorithm is able to model many structures similar to those in the primary visual cortex, such as receptive fields, topographic maps, ocular dominance, orientation and size preference columns, and patterned lateral connections between neurons. While the model gives very interesting results, it is computationally quite demanding. This precludes large-scale simulations which would be important for understanding the massively parallel processing in the cortex.