Avoiding Problems Originating from Approximations

Next: Computational Complexity Up: Update Rules Previous: Update Rules for Posterior

Avoiding Problems Originating from Approximations

When constructing a learning algorithm which is based on approximations of the cost function, it is important to make sure that learning does not drive the network into areas of the parameter space where the approximations are no longer valid.

The approximations in (34) and (35) are based on a roughly quadratic or linear behaviour of the nonlinearities. This assumption is quite good if the posterior variance $\tilde{y}_j(t)$ of the inputs to the hidden neurons is not very large.

Since the approximations take into account only local behaviour of the nonlinearities g_j and MLP networks typically have multimodal posterior distributions, there must be areas of the parameter space where the second order derivative of the posterior probability with respect to one of the parameters $\theta$ is positive. This means that $\partial C_p / \partial \tilde{\theta}$ is negative which in turn means that it appears that the cost function can be made arbitrarily small by letting $\tilde{\theta}$ grow.

It is easy to see that the problem is due to the local estimate of gsince the logarithm of the posterior eventually has to go to negative infinity. The derivative $\partial C_p / \partial \tilde{\theta}$ will thus be positive for large $\tilde{\theta}$ , but the local estimate of g_j fails to account for this.

In order to discourage the network from adapting itself into areas of parameter space where the problems might occur and to deal with the problem if it nevertheless occurred, the terms in (34) which have negative contribution to $\partial C_p / \partial \tilde{\theta}$ will be neglected in the computation of the gradients. As this can only make the estimate of $\tilde{\theta}$ in (41) smaller, this leads, in general, to increasing the accuracy of the approximations in (34) and (35).

Next: Computational Complexity Up: Update Rules Previous: Update Rules for Posterior

Harri Lappalainen
2000-03-03