The combination of MLP networks, generative learning and full Bayesian analysis is novel -- to the best of our knowledge -- although the individual parts have been published earlier. For instance, MLP networks were used as generative models in [7]. The model was inverted by gradient descent as in this work, but Bayesian analysis was not applied.

Rectified Gaussian belief networks were used as generative models in [2], but the Bayesian analysis was restricted to the posterior distributions of the latent variables and stochastic sampling was used instead of parametric approximation. Also [1] neglects the Bayesian treatment of the parameters of the network. With flexible models having a large number of parameters, it is important to take into account also the complexity of the nonlinear mapping. Restricting the Bayesian approach to the latent variables can lead to problems with overlearning.

Methods based on minimising the description length of the model are closely related, often equivalent, to Bayesian learning [3,4,6] since the description length is by definition the minus logarithm of the probability mass of the model. The description length of an auto-associative MLP model was minimised in [4]. The disadvantage of auto-associative models is that they need to learn both the generative mapping and its inversion and the learning can thus be slow.