Connection to coding

Next: Model Selection Up: Ensemble Learning Previous: Hierarchical Models

Connection to coding

There is a close connection between coding and probabilistic framework. The optimal code length of X is $L(\boldsymbol{X})=-\log_2 P(\boldsymbol{X})$ bits, where P(X) is the probability of observing X. In order to be able to encode the data compactly one has to find a good model for it. A sender and a receiver have agreed on a model structure $\mathcal{H}$ and the message will have two parts: the parameters $\boldsymbol{\theta}$ and the data X. It can be shown[25] that $L(\boldsymbol{X}) = C - \log \Vert d \boldsymbol{X}\Vert$ where the dX is the required accuracy of the data. Therefore minimising the cost function C of ensemble learning is equivalent to minimising the coding lenght of the data.

Tapani Raiko
2001-12-10