Next: Update Rules for Posterior Up: Nonlinear Independent Component Analysis Previous: Posterior Mean and Variance

Update Rules

In the previous section we derived all the equations needed for the computation of the cost function. Given the posterior means $\boldsymbol{\bar{\theta}}$ and variances $\boldsymbol{\tilde{\theta}}$ and discrete posterior probabilities $\dot{s}_{il}(t)$ , we can compute the cost function which measures the quality of the approximation of the posterior pdf of the unknown variables. Any standard optimisation algorithm could be used for minimising the cost function, but it is sensible to utilise the particular form of the function. Due to lack of space, we shall only outline the update rules but a more detailed description can be found in [4].

Let us denote C = C_q + C_p, where C_q is the part originating from the expectation of $\ln q(\boldsymbol{\theta})$ and C_p is the part originating from expectation of $-\ln p(X, \boldsymbol{\theta})$ . We shall see how it is possible to derive efficient fixed point algorithms for $\bar{\theta}$ and $\tilde{\theta}$ assuming that we have computed the gradients of C_p with respect to the current estimates of $\bar{\theta}$ and $\tilde{\theta}$ .

Since C_q has a term $-1/2 \ln 2\pi e \tilde{\theta}$ for each $\tilde{\theta}$ whose posterior is approximated by Gaussian $q(\theta)$ , solving for $\partial C / \partial \tilde{\theta} = 0$ yields an update rule for $\tilde{\theta}$ :

$\begin{displaymath}0 = \frac{\partial C_p}{\partial \tilde{\theta}} + \frac{\par... ...\frac{1}{2 \frac{\partial C_p}{\partial \tilde{\theta}}} \, . \end{displaymath}$

(32)

Now suppose $\ln p(X, \boldsymbol{\theta})$ is roughly quadratic with respect to $\theta$ :

$\begin{displaymath}-\ln p(X, \boldsymbol{\theta}) \approx \alpha + (\theta - \theta_\mathrm{opt})^2 \beta \, . \end{displaymath}$

(33)

Then C_p would be

$\begin{displaymath}C_p \approx \alpha + [(\bar{\theta} - \theta_\mathrm{opt})^2 + \tilde{\theta}] \beta \end{displaymath}$

(34)

and hence the derivatives with respect to $\bar{\theta}$ and $\tilde{\theta}$ would be

$\displaystyle \frac{\partial C_p}{\partial \bar{\theta}}$	=	$\displaystyle 2(\bar{\theta} - \theta_\mathrm{opt}) \beta$	(35)
$\displaystyle \frac{\partial C_p}{\partial \tilde{\theta}}$	=	$\displaystyle \beta \, .$	(36)

As C_q does not depend on $\bar{\theta}$ , the optimal value for $\bar{\theta}$ is evidently $\theta_\mathrm{opt}$ and solving for that we obtain an update rule for $\bar{\theta}$ :

$\begin{displaymath}\bar{\theta}_\mathrm{new} = \theta_\mathrm{opt} = \bar{\thet... ...frac{\partial C_p}{\partial \bar{\theta}} \tilde{\theta} \, . \end{displaymath}$

(37)

Since this update rule makes a quadratic approximation of the cost function C, it can be viewed as Newton iteration which assumes that $\bar{\theta}$ is the only variable which changes because the quadratic approximation does not take into account the cross terms $\partial^2 C/\partial \theta_i \partial \theta_j$ . In practice all the weights A and B, for instance, are adapted simultaneously and each weight affects the optimal value of the other weights. In [4] it is explained how it is possible to compensate for the error which results in the invalid assumption of independent adaptations.

Next: Update Rules for Posterior Up: Nonlinear Independent Component Analysis Previous: Posterior Mean and Variance

Harri Lappalainen
2000-03-03