Adapting the Subspace Learning Algorithm

Next: Overfitting Up: Principal Component Analysis with Previous: Adapting the EM Algorithm

Adapting the Subspace Learning Algorithm

The subspace learning algorithm works in a straightforward manner also in the presence of missing values. We just take the sum over only those indices and for which the data entry $x_{ij}$ (the th element of $\mathbf{X}$ ) is observed, in short $(i,j)\in O$ . The cost function is

$\displaystyle C = \sum_{(i,j) \in O} e_{ij}^2 \,,$ with $\displaystyle \qquad e_{ij} = x_{ij} - \sum_{k=1}^c a_{ik} s_{kj} \, .$

(10)

and its partial derivatives are

$\displaystyle \frac{\partial C}{\partial a_{il}} = -2 \sum_{j\mid (i,j)\in O} ... ...c{\partial C}{\partial s_{lj}} = -2 \sum_{i\mid (i,j)\in O} e_{ij} a_{il} \, .$

(11)

The update rules for gradient descent are

$\displaystyle \mathbf{A}\leftarrow \mathbf{A}+ \gamma \frac{\partial C}{\partia... ...mathbf{S}\leftarrow \mathbf{S}+ \gamma \frac{\partial C}{\partial \mathbf{S}}\,$

(12)

and the update rules for natural gradient descent are

$\displaystyle \mathbf{A}\leftarrow \mathbf{A}+ \gamma \frac{\partial C}{\partia... ...bf{S}+ \gamma \mathbf{S}\mathbf{S}^T \frac{\partial C}{\partial \mathbf{S}}\, .$

(13)

We propose a novel speed-up to the original simple gradient descent algorithm. In Newton's method for optimization, the gradient is multiplied by the inverse of the Hessian matrix. Newton's method is known to converge fast especially in the vicinity of the optimum, but using the full Hessian is computationally too demanding in truly high-dimensional problems. Here we use only the diagonal part of the Hessian matrix. We also include a control parameter $\alpha$ that allows the learning algorithm to interpolate between the standard gradient descent ( $\alpha=0$ ) and the diagonal Newton's method ( $\alpha=1$ ), much like the Levenberg-Marquardt algorithm. The learning rules then take the form

$\displaystyle a_{il}$	$\displaystyle \leftarrow a_{il} - \gamma' \left(\frac{\partial^2 C}{\partial a... ... O} e_{ij} s_{lj}} {\left(\sum_{j\mid (i,j)\in O} s_{lj}^2\right)^\alpha} \, ,$	(14)
$\displaystyle s_{lj}$	$\displaystyle \leftarrow s_{lj} - \gamma' \left(\frac{\partial^2 C}{\partial s... ... O} e_{ij} a_{il}} {\left(\sum_{i\mid (i,j)\in O} a_{il}^2\right)^\alpha} \, .$	(15)

The computational complexity is

per iteration.

Next: Overfitting Up: Principal Component Analysis with Previous: Adapting the EM Algorithm

Tapani Raiko 2007-09-11