Ghahramani and Roweis have shown Ghahramani99NIPS that for the nonlinear function in Eq. (9), the mean and variance have analytical expressions, to be presented shortly, provided that it has Gaussian input. In our graphical network structures this condition is fulfilled if we require that the nonlinearity must be inserted immediately after a Gaussian node. The same type of exponential function (9) is frequently used in standard radial-basis function networks Bishop95,Haykin98,Ghahramani99NIPS, but in a different manner. There the exponential function depends on the Euclidean distance from a center point, while in our case it depends on the input variable directly.
The first and second moments of the function (9) with respect to the distribution are Ghahramani99NIPS
The updating of the nonlinear node following directly a Gaussian node takes place similarly as the updating of a plain Gaussian node. The gradients of with respect to and are evaluated assuming that they arise from a quadratic term. This assumption holds since the nonlinearity can only propagate to the mean of Gaussian nodes. The update formulas are given in Appendix C.
Another possibility is to use as the nonlinearity the error function = , because its mean can be evaluated analytically and variance approximated from above Frey99NC. Increasing the variance increases the value of the cost function, too, and hence it suffices to minimise the upper bound of the cost function for finding a good solution. Frey99NC apply the error function in MLP (multilayer perceptron) networks Bishop95,Haykin98 but in a manner different from ours.
Finally, Murphy99 has applied the hyperbolic tangent function = , approximating it iteratively with a Gaussian. Honkela05NIPS approximate the same sigmoidal function with a Gauss-Hermite quadrature. This alternative could be considered here, too. A problem with it is, however, that the cost function (mean and variance) cannot be computed analytically.