Comments on the IJCNN'98 article

There is a Bayesian interpretation for the algorithm presented in the paper:

G. E. Hinton and D. van Camp. (1993) Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of COLT-93.
D. MacKay. (1995) Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. Network 6(3), pp. 469-505.
D. MacKay. (1995) Ensemble learning and evidence maximization. Available only as a Post Script version.
D. MacKay. (1995?) Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning. Available only as a Post Script version.

According to the Bayesian interpretation, my paper presents an algorithm for approximating the posterior density. If the approximation were a diagonal Gaussian density, the term 1/2 ln 12 v_i in equation 15 should read 1/2 ln 2 pi e v_i. I recommend to use the value 2 pi e (about 17.08) instead of 12.