Next: About this document ... Up: Hierarchical Nonlinear Factor Analysis Previous: Cost Function of the

Update Rule of the Nonlinear Node

The update rule for the nonlinear node is considered in Section . It is a minimisation problem for the cost function C_f=C_f,p+C_f,q. Here is a proof that each iteration decreases the cost function (or keeps it the same if the partial derivatives vanish).

A different notation is used here. The quadratic term $a \left< f(s) \right> + b [(\left< f(s) \right>- f(s_{\text{current}}))^2 + \mathrm{Var}\left\{f(s)\right\}] + d$ is taken into account by first finding an optimal expected value of f marked with $f_\text{opt}$ and the variance $\sigma_{f}^{2}$ that corresponds to how much difference will cost.

$\displaystyle \sigma_{f}^{2}$	=	(2b)^-1	(11.1)
$\displaystyle f_\text{opt}$	=	$\displaystyle f(s_\text{current}) - \sigma_{f}^{2}a$	(11.2)

$\displaystyle a \left< f(s) \right> + b [(\left< f(s) \right>- f(s_{\text{current}}))^2 + \mathrm{Var}\left\{f(s)\right\}] + d$			(11.3)
$\displaystyle = \left< \frac{1}{2}\frac{\left(f(s)-f_\text{opt}\right)^2}{\sigma_f^2} \right>+d_2$			(11.4)

Now all the affected terms of C can be written as

C_f,p	=	$\displaystyle \frac{1}{2}\left\{ \left< \exp v \right> \left[\left(\overline{s}... ...+\mathrm{Var}\left\{m\right\} +\widetilde{s} \right] - \left< v \right>\right\}$	(11.5)
		$\displaystyle + \frac{f_\text{opt}^{2}-2f_\text{opt}\left< f(s) \right>+\left< f(s)^2 \right>}{2\sigma_{f}^{2}}$
C_f,q	=	$\displaystyle - \frac{1}{2}\ln(2e\pi\widetilde{s})$	(11.6)

disregarding constants. The expectations $\left< f(s) \right>$ and $\left< f(s)^2 \right>$ are

$\displaystyle \left< f(s) \right>$	=	$\displaystyle \exp\left(-\frac{\overline{s}^2}{2\widetilde{s}+1}\right) (2\widetilde{s}+1)^{-\frac{1}{2}}$	(11.7)
$\displaystyle \left< f(s)^2 \right>$	=	$\displaystyle \exp\left(-\frac{2\overline{s}^2}{4\widetilde{s}+1}\right) (4\widetilde{s}+1)^{-\frac{1}{2}}$	(11.8)

and they are used as abbreviations.

Now we have to find $\overline{s}$ and $\widetilde{s}$ such that C_f=C_f,p+C_f,q is minimized. The partial derivates of C_fare

$\displaystyle \frac{\partial C_{f,p}}{\partial \overline{s}}$	=	$\displaystyle \left< \exp v \right> (\overline{s} - \left< m \right>)$	(11.9)
		$\displaystyle + \frac{\overline{s}}{\sigma_{f}^{2}}\left(\frac{2f_\text{opt}\le... ...ht>}{2\widetilde{s}+1} - \frac{2\left< f(s)^2 \right>}{4\widetilde{s}+1}\right)$
$\displaystyle \frac{\partial C_{f,q}}{\partial \overline{s}}$	=	0	(11.10)
$\displaystyle \frac{\partial C_{f,p}}{\partial \widetilde{s}}$	=	$\displaystyle \frac{\left< \exp v \right>}{2} +\frac{\left(1-2\overline{s}^{2}+... ...\text{opt}\left< f(s) \right>}{\sigma_{f}^{2}\left(2\widetilde{s}+1\right)^{2}}$	(11.11)
		$\displaystyle - \frac{\left(1-4\overline{s}^2+4\widetilde{s}\right)\left< f(s)^2 \right>}{\sigma_{f}^{2}\left(4\widetilde{s}+1\right)^{2}}$
$\displaystyle \frac{\partial C_{f,q}}{\partial \widetilde{s}}$	=	$\displaystyle -\frac{1}{2\widetilde{s}},$	(11.12)

from which we get a fixed point iteration for the update candidate of $\widetilde{s}$

$\displaystyle \frac{\partial C_f}{\partial \widetilde{s}} = -\frac{1}{2\widetilde{s}} + \frac{\partial C_{f,p}}{\partial \widetilde{s}}$			(11.13)
$\displaystyle \widetilde{s}_{new} = \left(2 \frac{\partial C_{f,p}}{\partial \widetilde{s}} \right)^{-1}.$			(11.14)

This would mean that the $\widetilde{s}$ is adjusted such that if $\partial C_{f,p}/\partial \widetilde{s}$ would stay the same, the whole partial derivative would become zero in one step. It is easy to see, that if $\partial C_f/\partial \widetilde{s}$ is positive, the iteration decreases $\widetilde{s}$ and vice versa. This means that the adjustments are always in the direction of the gradient descent.

It is worthwhile to note the connection derived from () and ():

$\displaystyle \frac{\partial^2 C_{f}}{\partial \left< m \right>^2}$

$\displaystyle \left< \exp v \right> = 2 \frac{\partial C_{f}}{\partial \mathrm{Var}\left\{m\right\}}.$

(11.15)

For $\overline{s}$ , a gradient descent is used with the step length approximated from Newton's method. The approximation composed from ()

$\displaystyle \frac{\partial^{2} C_{f}}{\partial \overline{s}^{2}}$	=	$\displaystyle \frac{\partial^{2} C_{f,p}}{\partial \overline{s}^{2}} \approx 2 \frac{\partial C_{f,p}}{\partial \widetilde{s}} = \frac{1}{\widetilde{s}_{new}}$	(11.16)
$\displaystyle \overline{s}_{new}$	=	$\displaystyle \overline{s} - \widetilde{s}_{new} \frac{\partial C_{f}}{\partial \overline{s}}$	(11.17)

is accurate, if the the true posterior is a Gaussian. Since the step direction is derived directly from the derivative, it is always locally correct.

As was shown, these steps guarantee a direction, in which the cost function decreases locally. If the derivatives are zero, the iterating has converged. To guarantee also that the cost function does not increase because of a too long step, the update candidates are verified. The step is halved as long as the cost is about to rise.

Next: About this document ... Up: Hierarchical Nonlinear Factor Analysis Previous: Cost Function of the

Tapani Raiko
2001-12-10