next up previous
Next: Overfitting Up: Principal Component Analysis with Previous: Adapting the EM Algorithm


Adapting the Subspace Learning Algorithm

The subspace learning algorithm works in a straightforward manner also in the presence of missing values. We just take the sum over only those indices $ i$ and $ j$ for which the data entry $ x_{ij}$ (the $ ij$th element of $ \mathbf{X}$) is observed, in short $ (i,j)\in O$. The cost function is

$\displaystyle C = \sum_{(i,j) \in O} e_{ij}^2
 \,,$   with$\displaystyle \qquad
 e_{ij} = x_{ij} - \sum_{k=1}^c a_{ik} s_{kj} \, .$ (10)

and its partial derivatives are

$\displaystyle \frac{\partial C}{\partial a_{il}} =
 -2 \sum_{j\mid (i,j)\in O} ...
...c{\partial C}{\partial s_{lj}} =
 -2 \sum_{i\mid (i,j)\in O} e_{ij} a_{il} \, .$ (11)

The update rules for gradient descent are

$\displaystyle \mathbf{A}\leftarrow \mathbf{A}+ \gamma \frac{\partial C}{\partia...
...mathbf{S}\leftarrow \mathbf{S}+ \gamma \frac{\partial C}{\partial \mathbf{S}}\,$ (12)

and the update rules for natural gradient descent are

$\displaystyle \mathbf{A}\leftarrow \mathbf{A}+ \gamma \frac{\partial C}{\partia...
...bf{S}+ \gamma \mathbf{S}\mathbf{S}^T \frac{\partial C}{\partial \mathbf{S}}\, .$ (13)

We propose a novel speed-up to the original simple gradient descent algorithm. In Newton's method for optimization, the gradient is multiplied by the inverse of the Hessian matrix. Newton's method is known to converge fast especially in the vicinity of the optimum, but using the full Hessian is computationally too demanding in truly high-dimensional problems. Here we use only the diagonal part of the Hessian matrix. We also include a control parameter $ \alpha$ that allows the learning algorithm to interpolate between the standard gradient descent ($ \alpha=0$) and the diagonal Newton's method ($ \alpha=1$), much like the Levenberg-Marquardt algorithm. The learning rules then take the form

$\displaystyle a_{il}$ $\displaystyle \leftarrow a_{il} -
 \gamma' \left(\frac{\partial^2 C}{\partial a...
... O} e_{ij} s_{lj}}
 {\left(\sum_{j\mid (i,j)\in O} s_{lj}^2\right)^\alpha} \, ,$ (14)
$\displaystyle s_{lj}$ $\displaystyle \leftarrow s_{lj} -
 \gamma' \left(\frac{\partial^2 C}{\partial s...
... O} e_{ij} a_{il}}
 {\left(\sum_{i\mid (i,j)\in O} a_{il}^2\right)^\alpha} \, .$ (15)

The computational complexity is $ O(Nc+nc)$ per iteration.


next up previous
Next: Overfitting Up: Principal Component Analysis with Previous: Adapting the EM Algorithm
Tapani Raiko 2007-09-11