There is a close connection between coding and probabilistic framework. The optimal code length of X is bits, where P(X) is the probability of observing X. In order to be able to encode the data compactly one has to find a good model for it. A sender and a receiver have agreed on a model structure and the message will have two parts: the parameters and the data X. It can be shown[25] that where the dX is the required accuracy of the data. Therefore minimising the cost function C of ensemble learning is equivalent to minimising the coding lenght of the data.