We could go on formulating reconstruction mappings and solving the output mappings, but this would become increasingly difficult. Moreover, it is not at all clear that we would get a network which would have the kind of competition behaviour we sketched in the beginning of this section. Figure 3.3 shows an example where the simple linear reconstruction does not yield a sparse code. The weight vector is much closer to the input than the two other weight vectors, but according to a linear reconstruction it would not be used in the representation. This is due to the fact that linear reconstruction does not promote sparsity in any way. If we use linear reconstruction and find the optimal outputs using equation 2.1, we obtain a dense code, because the reconstruction error is minimised when all the neurons are allowed to participate in the representation of the input. We need constraints for the outputs which assure the sparsity of the representation.
Figure 3.3:
An example of a situation where the reconstruction error
minimisation does not yield a sparse code. All vectors are
supposed to be nearly parallel and 3-dimensional. The figure
shows a view from top (cf. a map of a small area on globe). The
reconstruction is optimal when neurons 1 and 3 represent the
input. However, the code would be more sparse if the neuron 2
represented the input.
It should be reminded that we are using reconstruction error minimisation only because it allows easy derivation for algorithms and assures that the representation generated by the network contains information about the input. Our ultimate goal is not to find an exact reconstruction mapping, but to find a mapping which can generate sparse codes. If we can find outputs that approximately minimise the reconstruction error in equation 2.1, we can use the result in equation 2.2 to derive the learning rule and even an approximate minimisation will ensure that the outputs contain information about the inputs.
We shall now leave the reconstruction mapping for a moment and consider a way to formulate the constraints for the outputs. We have already found out that the winning strengths in equation 3.5 are able to measure the competition between two neurons. If we have more than two neurons we can still measure the competition between each pair of neurons and try to combine these competitions to a final winning strength. By combining equations 3.4 and 3.5 we can define a winning ratio between each pair of neurons i and j:
where and . If there were only two neurons in the network we could write and we would get the same solution as in equation 3.5. When we have more than two neurons we need a way to combine the winning ratios in order to obtain the winning strengths .
One can think that the winning ratios are fuzzy truth values for propositions ``Neuron i is a winner when it competes with neuron j''. This suggests the use of some kind of fuzzy and-operation to combine the winning ratios: neuron i is a winner if it wins the competition with neuron 1 and 2 and and n. A simple choice for and-operation is the product of truth values. We could simply calculate the product of over index j to obtain the winning strengths for neuron i.
However, there would be a small problem. Suppose that neuron 1 has lost the competition with neuron 2, that is, . It follows that . This is reasonable, since the neuron cannot be a winner if it has totally lost in competition with another neuron. The problem is that for other neurons than 2 may differ from one. This means that other neurons are competing with a neuron that has already lost. Since the lost neuron does not convey any information it would be reasonable that it would not affect the outputs of other neurons. This can be fixed by weighing the winning ratios in the product by the winning strengths. We can now examine the solutions to the preliminary winning strengths in the following equation:
We have named the values preliminary winning strengths, since they do not necessarily satisfy the property , but we can use to compute values which do satisfy it, as will be shown later on. The weights are put to the exponent, because we are dealing with a product of numbers. We have also introduced a parameter which can be used to control the amount of competition.
The preliminary winning strengths appear in both sides of equation 3.7 and it is impossible to find a solution in a closed form. However, this time we can prune most of the neurons from the computation of . If for some neuron i the winning ratio with another neuron j is zero, that is, , then we can omit the neuron i from the computations, since will anyway be zero and it does not affect the of other neurons. This allows us to find a set of winners: the set of neurons i, for which the winning ratio for all j. Only these neurons will have positive preliminary winning strengths. For all the other neurons the preliminary winning strength m = 0.