next up previous contents
Next: Regeneration Up: Learning the Structure Previous: Learning the Structure

Pruning

Restricting the posterior approximation to have a factorial form effectively means neglecting the posterior dependences of variables. Taking into account posterior dependences usually increases computational complexity significantly. Often the computer time would be better used in a larger model with a simple posterior approximation. Moreover, often the latent variable models exhibit rotational and other invariances which ensemble learning can use by choosing a solution where the factorial approximation is most accurate (see [63] for an example).

Factorial posterior approximation often leads to pruning of some of the connections in the model. When there is not enough data to estimate all the parameters, some directions are ill-determined. This causes the posterior distribution along those directions to be roughly equal to the prior distribution. In ensemble learning with a factorial posterior approximation, the ill-determined directions tend to get aligned with the axes of the parameter space because then the factorial approximation is most accurate.

The pruning tendency makes it easy to use for instance sparsely connected models because the learning algorithm automatically selects a small amount of well-determined parameters. In the early phases of learning, pruning can be harmful, however, because large parts of the model can get pruned away before a sensible representation has emerged. This corresponds to a local minimum of the algorithm. There are far less local minima with a posterior approximation taking into account the posterior dependences, but that would sacrifice computational efficiency. It seems that linear time learning algorithms cannot avoid local minima in general, but suitable choices of model structure and learning scheme can ameliorate the problem considerably.

If a neuron is effectively pruned away, it will not become useful again. In other words, is a local minimum with respect to the cost function for a neuron to be dead. If the neuron does not model anything, the output should be dampened off. If outputs are dampened off, it is not worthwhile to try to model anything. In practice, when a neuron is not useful, the weights that multiply its outputs diminish towards zero. For efficiency reasons these ``dead'' neurons are removed. They can be identified by estimating the cost function with and without them.


next up previous contents
Next: Regeneration Up: Learning the Structure Previous: Learning the Structure
Tapani Raiko
2001-12-10