Next: Imputation Algorithm Up: Principal Component Analysis for Previous: Subspace Learning Algorithm

Principal Component Analysis with Missing Values

Let us consider the same problem when the data matrix has missing entries. We would like to find $\mathbf{A}$ and $\mathbf{S}$ such that $\mathbf{X}\approx \mathbf{A}\mathbf{S}$ for the observed data samples. The rest of the product $\mathbf{A}\mathbf{S}$ represents the reconstruction of missing values.

The subspace learning algorithm works in a straightforward manner also in the presence of missing values. We just take the sum over only those indices and for which the data entry $x_{ij}$ (the th element of $\mathbf{X}$ ) is observed, in short $(i,j)\in O$ . The cost function is

$\displaystyle C = \sum_{(i,j) \in O} e_{ij}^2 \,,$ with $\displaystyle \qquad e_{ij} = x_{ij} - \sum_{k=1}^c a_{ik} s_{kj} \, ,$

(3)

and its partial derivatives are

$\displaystyle \frac{\partial C}{\partial a_{il}} = -2 \sum_{j\mid (i,j)\in O} e... ...ac{\partial C}{\partial s_{lj}} = -2 \sum_{i\mid (i,j)\in O} e_{ij} a_{il} \, .$

(4)

We propose to use a speed-up to the gradient descent algorithm. In Newton's method for optimization, the gradient is multiplied by the inverse of the Hessian matrix. Newton's method is known to be fast-converging, but using the full Hessian is computationally costly in high-dimensional problems ( $d\gg 1$ ). Here we use only the diagonal part of the Hessian matrix, and include a control parameter $\alpha$ that allows the learning algorithm to vary from the standard gradient descent ( $\alpha=0$ ) to the diagonal Newton's method ( $\alpha=1$ ). The final learning rules then take the form

$\displaystyle a_{il}$	$\displaystyle \leftarrow a_{il} - \gamma' \left(\frac{\partial^2 C}{\partial a_... ...n O} e_{ij} s_{lj}} {\left(\sum_{j\mid (i,j)\in O} s_{lj}^2\right)^\alpha} \, ,$	(5)
$\displaystyle s_{lj}$	$\displaystyle \leftarrow s_{lj} - \gamma' \left(\frac{\partial^2 C}{\partial s_... ...n O} e_{ij} a_{il}} {\left(\sum_{i\mid (i,j)\in O} a_{il}^2\right)^\alpha} \, .$	(6)

For comparison, we also consider two alternative PCA methods that can be adapted for missing values.

Subsections

- Imputation Algorithm
- EM Algorithm

Next: Imputation Algorithm Up: Principal Component Analysis for Previous: Subspace Learning Algorithm

Tapani Raiko 2007-07-16