next up previous contents
Next: Model Selection Up: Ensemble Learning Previous: Hierarchical Models

   
Connection to coding

There is a close connection between coding and probabilistic framework. The optimal code length of X is $L(\boldsymbol{X})=-\log_2 P(\boldsymbol{X})$bits, where P(X) is the probability of observing X. In order to be able to encode the data compactly one has to find a good model for it. A sender and a receiver have agreed on a model structure $\mathcal{H}$and the message will have two parts: the parameters $\boldsymbol{\theta}$ and the data X. It can be shown[25] that $L(\boldsymbol{X}) = C - \log \Vert d
\boldsymbol{X}\Vert $ where the dX is the required accuracy of the data. Therefore minimising the cost function C of ensemble learning is equivalent to minimising the coding lenght of the data.



Tapani Raiko
2001-12-10