We studied a number of different methods for PCA with sparse data and it turned out that a simple gradient descent approach worked best due to its minimal computational complexity per iteration. We could also speed it up more than ten times by using an approximated Newton's method. We found out empirically that setting the parameter seems to work well for our problem. It is left for future work to find out whether this generalizes to other problem settings. There are also many other ways to speed-up the gradient descent algorithm. The natural gradient did not help here, but we expect that the conjugate gradient method would. The modification to the gradient proposed in this paper, could be used together with the conjugate gradient speed-up. This will be another future research topic.
There are also other benefits in solving the PCA problem by gradient descent. Algorithms that minimize an explicit cost function are rather easy to extend. The case of variational Bayesian learning applied to PCA was considered in Section 4, but there are many other extensions of PCA, such as using non-Gaussianity, non-linearity, mixture models, and dynamics.
The developed algorithms can prove useful in many
applications such as bioinformatics, speech processing, and meteorology,
in which large-scale datasets with missing values are very common.
The required computational burden is linearly proportional to the number
of measured values. Note also that the
proposed techniques provide an analogue of confidence regions
showing the reliability of estimated quantities.
Acknowledgments
This work was supported in part by the Academy of Finland under its Centers
for Excellence in Research Program, and the IST Program of the European
Community, under the PASCAL Network of Excellence, IST-2002-506778. This
publication only reflects the authors' views. We would like to thank Antti
Honkela for useful comments.