next up previous contents
Next: About this document ... Up: Hierarchical Nonlinear Factor Analysis Previous: Cost Function of the

   
Update Rule of the Nonlinear Node

The update rule for the nonlinear node is considered in Section [*]. It is a minimisation problem for the cost function Cf=Cf,p+Cf,q. Here is a proof that each iteration decreases the cost function (or keeps it the same if the partial derivatives vanish).

A different notation is used here. The quadratic term $a \left< f(s) \right> + b [(\left< f(s) \right>-
f(s_{\text{current}}))^2 + \mathrm{Var}\left\{f(s)\right\}] + d$is taken into account by first finding an optimal expected value of f marked with $f_\text{opt}$ and the variance $\sigma_{f}^{2}$ that corresponds to how much difference will cost.

$\displaystyle \sigma_{f}^{2}$ = (2b)-1 (11.1)
$\displaystyle f_\text{opt}$ = $\displaystyle f(s_\text{current}) - \sigma_{f}^{2}a$ (11.2)


$\displaystyle a \left< f(s) \right> + b [(\left< f(s) \right>-
f(s_{\text{current}}))^2 + \mathrm{Var}\left\{f(s)\right\}] + d$     (11.3)
$\displaystyle = \left< \frac{1}{2}\frac{\left(f(s)-f_\text{opt}\right)^2}{\sigma_f^2} \right>+d_2$     (11.4)

Now all the affected terms of C can be written as

  
Cf,p = $\displaystyle \frac{1}{2}\left\{ \left< \exp v \right>
\left[\left(\overline{s}...
...+\mathrm{Var}\left\{m\right\}
+\widetilde{s} \right] - \left< v \right>\right\}$ (11.5)
    $\displaystyle + \frac{f_\text{opt}^{2}-2f_\text{opt}\left< f(s) \right>+\left< f(s)^2 \right>}{2\sigma_{f}^{2}}$  
Cf,q = $\displaystyle - \frac{1}{2}\ln(2e\pi\widetilde{s})$ (11.6)

disregarding constants. The expectations $\left< f(s) \right>$ and $\left< f(s)^2 \right>$ are
$\displaystyle \left< f(s) \right>$ = $\displaystyle \exp\left(-\frac{\overline{s}^2}{2\widetilde{s}+1}\right)
(2\widetilde{s}+1)^{-\frac{1}{2}}$ (11.7)
$\displaystyle \left< f(s)^2 \right>$ = $\displaystyle \exp\left(-\frac{2\overline{s}^2}{4\widetilde{s}+1}\right)
(4\widetilde{s}+1)^{-\frac{1}{2}}$ (11.8)

and they are used as abbreviations.

Now we have to find $\overline{s}$ and $\widetilde{s}$ such that Cf=Cf,p+Cf,q is minimized. The partial derivates of Cfare

   
$\displaystyle \frac{\partial C_{f,p}}{\partial \overline{s}}$ = $\displaystyle \left< \exp v \right> (\overline{s} - \left< m \right>)$ (11.9)
    $\displaystyle + \frac{\overline{s}}{\sigma_{f}^{2}}\left(\frac{2f_\text{opt}\le...
...ht>}{2\widetilde{s}+1} - \frac{2\left< f(s)^2 \right>}{4\widetilde{s}+1}\right)$  
$\displaystyle \frac{\partial C_{f,q}}{\partial \overline{s}}$ = 0 (11.10)
$\displaystyle \frac{\partial C_{f,p}}{\partial \widetilde{s}}$ = $\displaystyle \frac{\left< \exp v \right>}{2} +\frac{\left(1-2\overline{s}^{2}+...
...\text{opt}\left< f(s) \right>}{\sigma_{f}^{2}\left(2\widetilde{s}+1\right)^{2}}$ (11.11)
    $\displaystyle - \frac{\left(1-4\overline{s}^2+4\widetilde{s}\right)\left< f(s)^2 \right>}{\sigma_{f}^{2}\left(4\widetilde{s}+1\right)^{2}}$  
$\displaystyle \frac{\partial C_{f,q}}{\partial \widetilde{s}}$ = $\displaystyle -\frac{1}{2\widetilde{s}},$ (11.12)

from which we get a fixed point iteration for the update candidate of $\widetilde{s}$
 
$\displaystyle \frac{\partial C_f}{\partial \widetilde{s}} = -\frac{1}{2\widetilde{s}} + \frac{\partial C_{f,p}}{\partial \widetilde{s}}$     (11.13)
$\displaystyle \widetilde{s}_{new} = \left(2 \frac{\partial C_{f,p}}{\partial \widetilde{s}} \right)^{-1}.$     (11.14)

This would mean that the $\widetilde{s}$ is adjusted such that if $\partial C_{f,p}/\partial \widetilde{s}$ would stay the same, the whole partial derivative would become zero in one step. It is easy to see, that if $\partial C_f/\partial \widetilde{s}$ is positive, the iteration decreases $\widetilde{s}$ and vice versa. This means that the adjustments are always in the direction of the gradient descent.

It is worthwhile to note the connection derived from ([*]) and ([*]):

 
$\displaystyle \frac{\partial^2 C_{f}}{\partial \left< m \right>^2}$ = $\displaystyle \left< \exp v \right> = 2 \frac{\partial C_{f}}{\partial \mathrm{Var}\left\{m\right\}}.$ (11.15)

For $\overline{s}$, a gradient descent is used with the step length approximated from Newton's method. The approximation composed from ([*])

  
$\displaystyle \frac{\partial^{2} C_{f}}{\partial \overline{s}^{2}}$ = $\displaystyle \frac{\partial^{2} C_{f,p}}{\partial \overline{s}^{2}} \approx 2 \frac{\partial C_{f,p}}{\partial \widetilde{s}} = \frac{1}{\widetilde{s}_{new}}$ (11.16)
$\displaystyle \overline{s}_{new}$ = $\displaystyle \overline{s} - \widetilde{s}_{new} \frac{\partial C_{f}}{\partial \overline{s}}$ (11.17)

is accurate, if the the true posterior is a Gaussian. Since the step direction is derived directly from the derivative, it is always locally correct.

As was shown, these steps guarantee a direction, in which the cost function decreases locally. If the derivatives are zero, the iterating has converged. To guarantee also that the cost function does not increase because of a too long step, the update candidates are verified. The step is halved as long as the cost is about to rise.


next up previous contents
Next: About this document ... Up: Hierarchical Nonlinear Factor Analysis Previous: Cost Function of the
Tapani Raiko
2001-12-10