Given the observed data, there are usually more than one way to explain it. With a flexible model family -- like MLP-networks -- there is always an infinite amount of explanations and it could be difficult to choose among them. Choosing too complex a model would result in overlearning, a situation where one not only finds the underlying causes of the observations but also makes up meaningless explanations for the noise always present in real signals. Choosing too simple a model results in underlearning, i.e., would leave some of the true causes hidden.
The solution to the problem is that no single model should, in fact, be chosen. Probability theory tells that all the explanations should be taken into account and weighted according to their posterior probabilities. This approach, known as Bayesian learning, optimally solves the tradeoff between under- and overlearning.
The posterior probability densities of too simple models are low because they leave much of the data unexplained while the peaks of the posterior probability density function (pdf) of too complex models are high but also very narrow. This is because a complex model is very sensitive to changes in its parameters. Due to the narrow peaks, too complex models occupy little probability mass and therefore contribute little to expectations weighted by the probabilities.