next up previous contents
Next: Building blocks of the Up: Ensemble learning Previous: Ensemble learning   Contents

Information theoretic approach

In their original paper [23], Hinton and van Camp approached ensemble learning from an information theoretic point of view by using the Minimum Description Length (MDL) Principle [61]. They developed a new coding method for noisy parameter values which led to the cost function of Equation (3.11). This allows interpreting the cost $ C(\boldsymbol{\theta})$ in Equation (3.11) as a description length for the data using the chosen model.

The MDL principle asserts that the best model for given data is the one that attains the shortest description of the data. The description length can be evaluated in bits and it represents the length of the message needed to transmit the data. The idea is that one builds a model for the data and then sends the description of that model and the residual of the data that could not be modelled. Thus the total description length is $ L($data$ ) = L($model$ ) +
L($error$ )$.

The code length is related to probability because according to the coding theorem, an event $ x_1$ having probability $ p(x_1)$ can be coded using $ -\log_2 p(x_1)$ bits, assuming both the sender and the receiver know the distribution $ p$.

In their article Hinton and van Camp developed a method for encoding the parameters of the model in such a way, that the expected code length is the one given by Equation (3.11). Derivation of this result can be found in the original paper by Hinton and van Camp [23] or in the doctoral thesis [57] by Harri Valpola.


next up previous contents
Next: Building blocks of the Up: Ensemble learning Previous: Ensemble learning   Contents
Antti Honkela 2001-05-30