In the approximation of the means and variances, the inputs of each
function are assumed to be uncorrelated. If there exist i and j
such that the value of is propagated in more than one
separate route to the parameters of the function fj, then the
parameters are correlated and the computation of mean and variance may
be inaccurate. This should not be a very severe restriction; in an
MLP with one hidden layer, for example, no parameter (weight) affects
any output neuron in more than one separate route. Should the
approximation turn out to be too inaccurate, some of the cross terms
may be taken into account.
We have assumed that the errors made in the
discretisation of the parameters are mutually uncorrelated. It can be
argued that the estimate of the description length would be more
accurate if we took into account the dependence between parameters: a
change in the value of one parameter might be partially compensated by
a suitable change in the values of the others. Assuming uncorrelated
errors effectively penalises parametrisations with strong dependencies
between parameters, since the description length could be made shorter
using a parametrisation which removes the dependencies. Usually it is
desirable to favour parametrisations with small dependencies, and thus
the assumption of uncorrelated discretisation errors is reasonable.
In order to have a decodable message, the accuracies of the parameter values should be coded and sent before sending the truncated parameters. Wallace and Freeman [11] argue that the values and the accuracies of the parameters are not independent, and one can construct a decodable message with almost the same code length we have used.