There is a close connection between coding and probabilistic
framework. The optimal code length of 
X is 
bits, where 
P(X) is the probability of observing 
X. In order
to be able to encode the data compactly one has to find a good model
for it. A sender and a receiver have agreed on a model structure 
and the message will have two parts: the parameters 
and the data
X. It can be shown[25] that 
where the 
dX is the required accuracy of the
data. Therefore minimising the cost function C of ensemble learning
is equivalent to minimising the coding lenght of the data.