next up previous
Next: Application to General ICA Up: Fast Algorithms for Bayesian Previous: Fast EM-algorithm by Filtering

FastICA as EM-Algorithm with Filtering of Gaussian Noise

The FastICA algorithm [5] can be interpreted as performing the above described noise removal. In FastICA the requirement of whitening the sources is also made and therefore Rss=I and (ATA)-1=I. Then, the sources can be found one by one and we can consider a single column aof the mixing matrix A.

To derive the FastICA algorithm from the modified EM-algorithm, it is sufficient to note that the term XGF(s0G)T/M=as0GF(s0G)T/Mis Cfa where Cf is a constant that depends only on the nonlinear function $f(\cdot)$. Then the update rule is
\begin{align*}\hat \mathbf a - \hat \mathbf a_G&=\mathbf X\mathbf F(\mathbf s_0^...
... a - \hat \mathbf a_G}{\Vert\hat \mathbf a - \hat \mathbf a_G\Vert}
\end{align*}
which is the FastICA algorithm, where the constant Cf is the expectation $\operatorname{E}\{s_{0G}f(s_{0G})\}$.

The choice of fixed nonlinearity $f(\cdot)$ is implicitly connected to the distribution of the sources s. The derivation of the EM-algorithm required that

\begin{displaymath}f(s)=\frac{\partial \log p(s)}{\partial s}
\end{displaymath}

However, we see that $f(\cdot)$ has certain degrees of freedom due to taking the difference XF(s0T)-XGF(s0GT). Expanding f polynomially we obtain $p(s)=\exp(a+bs+cs^2+dg(s))$ where g'(s)=f(s) and g(s) contains all the powers of f higher than two and possibly lower moments too. This representation follows since in the update rule constants and linear terms of $f(\cdot)$ will cancel out. Therefore they will appear in the distribution p(s) in the exponent with the power raised by one due to integration. Since p(s) must be a probability density, the constant a will be fixed by the requirement $\int p(s)ds=1$. Mean and variance of s will determine the constants b and c, since the sources are required to be zero-mean and whitened (variance is fixed to unity). There is one free parameter d left, which means that there is not only one distribution corresponding to $f(\cdot)$but a family of p(s). Typically the family includes both super- and sub-Gaussian densities, which is why the same $f(\cdot)$ can be used for both cases.


next up previous
Next: Application to General ICA Up: Fast Algorithms for Bayesian Previous: Fast EM-algorithm by Filtering
Harri Lappalainen
2000-03-09