Next: Introduction
Principal Component Analysis for Sparse High-Dimensional Data
Tapani Raiko - Alexander Ilin - Juha Karhunen
Principal Component Analysis
for Sparse High-Dimensional Data
Tapani Raiko - Alexander Ilin - Juha Karhunen
Abstract:
Principal component analysis (PCA) is a widely used technique for data
analysis and dimensionality reduction. Eigenvalue decomposition is
the standard algorithm for solving PCA, but a number of other
algorithms have been proposed. For instance, the EM algorithm is much
more efficient in case of high dimensionality and a small number of
principal components. We study a case where the data are
high-dimensional and a majority of the values are missing. In this
case, both of these algorithms turn out to be inadequate. We propose using a
gradient descent algorithm inspired by Oja's rule, and speeding it up
by an approximate Newton's method. The computational complexity of
the proposed method is linear with respect to the number of observed values in the
data and to the number of principal components.
In the experiments with
Netflix data, the proposed algorithm is about ten times
faster than any of the four comparison methods.
Tapani Raiko
2007-09-11