next up previous
Next: Imputation Algorithm Up: Principal Component Analysis for Previous: Subspace Learning Algorithm


Principal Component Analysis with Missing Values

Let us consider the same problem when the data matrix has missing entries. We would like to find $ \mathbf{A}$ and $ \mathbf{S}$ such that $ \mathbf{X}\approx \mathbf{A}\mathbf{S}$ for the observed data samples. The rest of the product $ \mathbf{A}\mathbf{S}$ represents the reconstruction of missing values.

The subspace learning algorithm works in a straightforward manner also in the presence of missing values. We just take the sum over only those indices $ i$ and $ j$ for which the data entry $ x_{ij}$ (the $ ij$th element of $ \mathbf{X}$) is observed, in short $ (i,j)\in O$. The cost function is

$\displaystyle C = \sum_{(i,j) \in O} e_{ij}^2 \,,$   with$\displaystyle \qquad e_{ij} = x_{ij} - \sum_{k=1}^c a_{ik} s_{kj} \, ,$ (3)

and its partial derivatives are

$\displaystyle \frac{\partial C}{\partial a_{il}} = -2 \sum_{j\mid (i,j)\in O} e...
...ac{\partial C}{\partial s_{lj}} = -2 \sum_{i\mid (i,j)\in O} e_{ij} a_{il} \, .$ (4)

We propose to use a speed-up to the gradient descent algorithm. In Newton's method for optimization, the gradient is multiplied by the inverse of the Hessian matrix. Newton's method is known to be fast-converging, but using the full Hessian is computationally costly in high-dimensional problems ($ d\gg 1$). Here we use only the diagonal part of the Hessian matrix, and include a control parameter $ \alpha $ that allows the learning algorithm to vary from the standard gradient descent ($ \alpha=0$) to the diagonal Newton's method ($ \alpha=1$). The final learning rules then take the form

$\displaystyle a_{il}$ $\displaystyle \leftarrow a_{il} - \gamma' \left(\frac{\partial^2 C}{\partial a_...
...n O} e_{ij} s_{lj}} {\left(\sum_{j\mid (i,j)\in O} s_{lj}^2\right)^\alpha} \, ,$ (5)
$\displaystyle s_{lj}$ $\displaystyle \leftarrow s_{lj} - \gamma' \left(\frac{\partial^2 C}{\partial s_...
...n O} e_{ij} a_{il}} {\left(\sum_{i\mid (i,j)\in O} a_{il}^2\right)^\alpha} \, .$ (6)

For comparison, we also consider two alternative PCA methods that can be adapted for missing values.



Subsections
next up previous
Next: Imputation Algorithm Up: Principal Component Analysis for Previous: Subspace Learning Algorithm
Tapani Raiko 2007-07-16