One can use the SVD approach (4) in order to find an approximate solution to the PCA problem. However, estimating the covariance matrix becomes very difficult when there are lots of missing values. If we estimate leaving out terms with missing values from the average, we get for the estimate of the covariance matrix
(9) |
Another option is to complete the data matrix by iteratively imputing the missing values (see, e.g., [2]). Initially, the missing values can be replaced by zeroes. The covariance matrix of the complete data can be estimated without the problems mentioned above. Now, the product can be used as a better estimate for the missing values, and this process can be iterated until convergence. This approach requires the use of the complete data matrix, and therefore it is computationally very expensive if a large part of the data matrix is missing. The time complexity of computing the sample covariance matrix explicitly is . We will further refer to this approach as the imputation algorithm.
Note that after convergence, the missing values do not contribute to the reconstruction error (2). This means that the imputation algorithm leads to the solution which minimizes the reconstruction error of observed values only.