It takes years for the human brain to mature. Although neural networks researchers are usually not that patient with their own models, the process by which the connections in the brain grow and adapt can give useful hints on how large hierarchical artificial neural networks can be learned.
Two basic principles have shown their usefulness in unsupervised learning of MLP networks. The first is that it is easier to start with a large network and then prune away unused parts. This is similar to what happens in the biological brain where a significant number of the neurons die during early development (see, e.g., [69,104,62]). There is evidence that the neurons which die are the ones which fail to find something reasonable to represent. The same behaviour is automatically implemented by ensemble learning as discussed, for example, in publication III.
The second principle is the critical period of development during which the connections are established. This period starts earlier in the lower areas close to sensory areas and then proceeds to higher levels . Experience with ensemble learning for unsupervised MLP networks showed that an algorithm which is capable of pruning connections and neurons needs this procedure for learning hierarchical representations. If the critical period would be too early, the neurons would be pruned before they have a chance to learn to represent anything useful because their lower areas would not yet have established a sensible representation.
Although the neurons in the cortex can have 1,000-10,000 connections , one neuron connects only to a very small fractions of all neurons, that is, the connectivity is sparse. The MLP networks studied in this thesis had only tens of neurons and it was therefore feasible to use fully connected layers. However, ensemble learning can also accommodate the pruning of connections in large networks. The dynamic model of the factors in publication VIII exhibited signs of sparse connectivity.
In ensemble learning, the pruning is caused by the pressure to make the posterior probability of the parameters and factors of the model as independent as possible. It can be argued that this is also a useful strategy for the brain as it would be difficult to keep track of the posterior dependences of the activities of all the neurons in the brain and even more difficult to model the posterior dependences of the strengths of the synapses.