Next: Algorithms for Principal Component Up: Principal Component Analysis for Previous: Principal Component Analysis for

Introduction

Principal component analysis (PCA) [1,2,3,4,5,6] is a classic technique in data analysis. It can be used for compressing higher dimensional data sets to lower dimensional ones for data analysis, visualization, feature extraction, or data compression. PCA can be derived from a number of starting points and optimization criteria [2,3,4]. The most important of these are minimization of the mean-square error in data compression, finding mutually orthogonal directions in the data having maximal variances, and decorrelation of the data using orthogonal transformations [5].

While standard PCA is a very well-established linear statistical technique based on second-order statistics (covariances), it has recently been extended into various directions and considered from novel viewpoints. For example, various adaptive algorithms for PCA have been considered and reviewed in [4,6]. Fairly recently, PCA was shown to emerge as a maximum likelihood solution from a probabilistic latent variable model independently by several authors; see [3] for a discussion and references.

In this paper, we study PCA in the case where most of the data values are missing (or unknown). Common algorithms for solving PCA prove to be inadequate in this case, and we thus propose a new algorithm. The problem of overfitting and possible solutions are also outlined.

Next: Algorithms for Principal Component Up: Principal Component Analysis for Previous: Principal Component Analysis for

Tapani Raiko 2007-09-11