Ghahramani and Roweis have shown Ghahramani99NIPS
that for the nonlinear function
in Eq. (9),
the mean and variance have analytical expressions, to be presented shortly, provided that it has
Gaussian input. In our graphical network structures this condition is
fulfilled if we require that the nonlinearity must be inserted
immediately after a Gaussian node.
The same type of exponential function (9) is frequently used
in standard radial-basis function networks Bishop95,Haykin98,Ghahramani99NIPS,
but in a different manner. There the exponential function depends on
the Euclidean distance from a center point, while in our case it depends
on the input variable
directly.
The first and second moments of the function (9) with respect
to the distribution are Ghahramani99NIPS
The updating of the nonlinear node following directly a Gaussian node
takes place similarly as the updating of a plain Gaussian node. The
gradients of
with respect to
and
are evaluated assuming that they arise from a
quadratic term. This assumption holds since the nonlinearity can only
propagate to the mean of Gaussian nodes. The update formulas are given
in Appendix C.
Another possibility is to use as the nonlinearity
the error function =
,
because its mean can be evaluated analytically and variance approximated
from above Frey99NC. Increasing the variance increases the value
of the cost function, too, and hence it suffices to minimise the upper bound
of the cost function for finding a good solution.
Frey99NC apply the error function in MLP (multilayer perceptron)
networks Bishop95,Haykin98 but in a manner different from ours.
Finally, Murphy99 has applied the hyperbolic tangent
function =
, approximating it iteratively with a Gaussian.
Honkela05NIPS approximate the same sigmoidal function with a Gauss-Hermite quadrature.
This alternative could be considered
here, too. A problem with it is, however, that the cost function (mean
and variance) cannot be computed analytically.