Although sparse coding is recognised as an important characteristic of information processing in the brain, algorithms for finding sparse codes have not yet been studied extensively and only a handful of algorithms have been proposed [Földiák, 1990, Saund, 1995, Olshausen and Field, 1995, Dayan and Zemel, 1995, Hinton and Zemel, 1994]. These can be roughly divided into two groups. In the first group there are the algorithms which use some kind of scheme to minimise the reconstruction error, often combined with constraints for the outputs or error functions which promote sparsity. The other group of algorithms is based on minimising the mutual predictability of activities of neurons. The former approach seems to be computationally more efficient -- at least with modern computer technology -- but the latter seems to be biologically more plausible.

In sparse coding, finding the right units to represent the input is inherently difficult. Suppose we have one thousand neurons. If we want to represent the input with only one active unit at a time, as in local coding, there are only one thousand possibilities, and we can check all of them. If we wanted to use ten units, which would yield a fairly sparse code, there would be roughly different possibilities. It is obvious that we cannot simply go through all possible combinations to check which one would yield the best representation. Therefore one has to use an approximative approach to find the active neurons. Most algorithms have some kind of iterative procedure to calculate the activities.

Suppose we would like a neural network to learn a good representation
for its inputs. We would like the learning to be *unsupervised*,
which means that the learning is based on the inputs only. This is in
contrast with supervised learning, where an external ``teacher'' is
needed to label the data, that is, to tell the desired outputs for
each input. The goal for a supervised network is to learn to mimic
this given mapping. In order to learn without a teacher an
unsupervised network has to have a way to measure the goodness of its
mapping by itself. Reconstruction error and mutual predictability are
suitable measures because they both guarantee that the outputs contain
information about the inputs.

Thu May 9 14:06:29 DST 1996