Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200
FIN-02015 HUT, FINLAND
E-mail:
Harri.Lappalainen@hut.fi
Post Script version (63 kb)
See also the comments I have added later.
The minimum description length (MDL) principle is an information theoretically based method to learn models from data. This paper presents how to efficiently use an MDL-based cost function with neural networks. As usual, the cost function can be used to adapt the parameters in the network, but it can also include terms to measure the complexity of the structure of the network and can thus be applied to determine the optimal structure. The basic idea is to convert a conventional neural network such that each parameter and each output of the neurons is assigned a mean and a variance. This greatly simplifies the computation of the description length and its gradient with respect to the parameters, which can then be adapted using standard gradient descent.