Nonlinearity node

Next: Other possible nodes Up: Variational Bayesian inference in Previous: Addition and multiplication nodes

Nonlinearity node

A serious problem arising here is that for most nonlinear functions it is impossible to compute the required expectations analytically. Here we describe a particular nonlinearity in detail and discuss the options for extending to other nonlinearities, for which the implementation is underway.

Ghahramani and Roweis have shown Ghahramani99NIPS that for the nonlinear function $f(s) = \exp(-s^2)$ in Eq. (9), the mean and variance have analytical expressions, to be presented shortly, provided that it has Gaussian input. In our graphical network structures this condition is fulfilled if we require that the nonlinearity must be inserted immediately after a Gaussian node. The same type of exponential function (9) is frequently used in standard radial-basis function networks Bishop95,Haykin98,Ghahramani99NIPS, but in a different manner. There the exponential function depends on the Euclidean distance from a center point, while in our case it depends on the input variable directly.

The first and second moments of the function (9) with respect to the distribution are Ghahramani99NIPS

$\displaystyle \left< f(s) \right>$	$\displaystyle = \exp\left(-\frac{\overline{s}^2}{2\widetilde{s}+1}\right) (2\widetilde{s}+1)^{-\frac{1}{2}}$	(30)
$\displaystyle \left< [f(s)]^2 \right>$	$\displaystyle = \exp\left(-\frac{2\overline{s}^2}{4\widetilde{s}+1}\right) (4\widetilde{s}+1)^{-\frac{1}{2}}$	(31)

The formula (30) provides directly the mean $\left< f(s) \right>$ , and the variance is obtained from (30) and (31) by applying the familiar formula $\mathrm{Var}\left\{f(s)\right\} = \left< [f(s)]^2 \right> - \left< f(s) \right>^2$ . The expected exponential $\left< \exp f(s) \right>$ cannot be evaluated analytically, which limits somewhat the use of the nonlinear node.

The updating of the nonlinear node following directly a Gaussian node takes place similarly as the updating of a plain Gaussian node. The gradients of ${\cal C}_p$ with respect to $\left< f(s) \right>$ and $\mathrm{Var}\left\{f(s)\right\}$ are evaluated assuming that they arise from a quadratic term. This assumption holds since the nonlinearity can only propagate to the mean of Gaussian nodes. The update formulas are given in Appendix C.

Another possibility is to use as the nonlinearity the error function = $\int_{-\infty}^{s} \exp(-r^2)dr$ , because its mean can be evaluated analytically and variance approximated from above Frey99NC. Increasing the variance increases the value of the cost function, too, and hence it suffices to minimise the upper bound of the cost function for finding a good solution. Frey99NC apply the error function in MLP (multilayer perceptron) networks Bishop95,Haykin98 but in a manner different from ours.

Finally, Murphy99 has applied the hyperbolic tangent function = $\tanh(s)$ , approximating it iteratively with a Gaussian. Honkela05NIPS approximate the same sigmoidal function with a Gauss-Hermite quadrature. This alternative could be considered here, too. A problem with it is, however, that the cost function (mean and variance) cannot be computed analytically.

Next: Other possible nodes Up: Variational Bayesian inference in Previous: Addition and multiplication nodes

Tapani Raiko 2006-08-28