Next: Application to General ICA Up: Fast Algorithms for Bayesian Previous: Fast EM-algorithm by Filtering

FastICA as EM-Algorithm with Filtering of Gaussian Noise

The FastICA algorithm [5] can be interpreted as performing the above described noise removal. In FastICA the requirement of whitening the sources is also made and therefore R_ss=I and (A^TA)^-1=I. Then, the sources can be found one by one and we can consider a single column aof the mixing matrix A.

To derive the FastICA algorithm from the modified EM-algorithm, it is sufficient to note that the term X_GF(s_0G)^T/M=as_0GF(s_0G)^T/Mis C_fa where C_f is a constant that depends only on the nonlinear function $f(\cdot)$ . Then the update rule is
$\begin{align*}\hat \mathbf a - \hat \mathbf a_G&=\mathbf X\mathbf F(\mathbf s_0^... ... a - \hat \mathbf a_G}{\Vert\hat \mathbf a - \hat \mathbf a_G\Vert} \end{align*}$
which is the FastICA algorithm, where the constant C_f is the expectation $\operatorname{E}\{s_{0G}f(s_{0G})\}$ .

The choice of fixed nonlinearity $f(\cdot)$ is implicitly connected to the distribution of the sources s. The derivation of the EM-algorithm required that

$\begin{displaymath}f(s)=\frac{\partial \log p(s)}{\partial s} \end{displaymath}$

However, we see that $f(\cdot)$ has certain degrees of freedom due to taking the difference XF(s₀^T)-X_GF(s_0G^T). Expanding f polynomially we obtain $p(s)=\exp(a+bs+cs^2+dg(s))$ where g'(s)=f(s) and g(s) contains all the powers of f higher than two and possibly lower moments too. This representation follows since in the update rule constants and linear terms of $f(\cdot)$ will cancel out. Therefore they will appear in the distribution p(s) in the exponent with the power raised by one due to integration. Since p(s) must be a probability density, the constant a will be fixed by the requirement $\int p(s)ds=1$ . Mean and variance of s will determine the constants b and c, since the sources are required to be zero-mean and whitened (variance is fixed to unity). There is one free parameter d left, which means that there is not only one distribution corresponding to $f(\cdot)$ but a family of p(s). Typically the family includes both super- and sub-Gaussian densities, which is why the same $f(\cdot)$ can be used for both cases.

Next: Application to General ICA Up: Fast Algorithms for Bayesian Previous: Fast EM-algorithm by Filtering

Harri Lappalainen
2000-03-09