next up previous
Next: Singular Value Decomposition Up: Algorithms for Principal Component Previous: Algorithms for Principal Component

Principal subspace and components

Assume that we have $ n$ $ d$-dimensional data vectors $ {\bf x}_1,{\bf x}_2,\ldots,{\bf x}_n$, which form the $ d\times n$ data matrix $ \mathbf{X}$ = $ [{\bf x}_1,{\bf x}_2,\ldots,{\bf x}_n]$. The matrix $ \mathbf{X}$ is decomposed into

$\displaystyle \mathbf{X}\approx \mathbf{A}\mathbf{S},$ (1)

where $ \mathbf{A}$ is a $ d\times c$ matrix, $ \mathbf{S}$ is a $ c\times n$ matrix and $ c\leq d \leq n$. Principal subspace methods [6,4] find $ \mathbf{A}$ and $ \mathbf{S}$ such that the reconstruction error

$\displaystyle C = \hspace{2mm} \lVert \mathbf{X}- \mathbf{A}\mathbf{S}\rVert _F...
...e{2mm}
 \sum_{i=1}^d \sum_{j=1}^n ( x_{ij} - \sum_{k=1}^c a_{ik} s_{kj} )^2\, ,$ (2)

is minimized. There $ F$ denotes the Frobenius norm, and $ x_{ij}$, $ a_{ik}$, and $ s_{kj}$ elements of the matrices $ \mathbf{X}$, $ \mathbf{A}$, and $ \mathbf{S}$, respectively. Typically the row-wise mean is removed from $ \mathbf{X}$ as a preprocessing step.

Without any further constraints, there exist infinitely many ways to perform such a decomposition. However, the subspace spanned by the column vectors of the matrix $ \mathbf{A}$, called the principal subspace, is unique. In PCA, these vectors are mutually orthogonal and have unit length. Further, for each $ k=1,\dots ,c$, the first $ k$ vectors form the $ k$-dimensional principal subspace. This makes the solution practically unique, see [4,2,5] for details.

There are many ways to determine the principal subspace and components [6,4,2]. We will discuss three common methods that can be adapted for the case of missing values.


next up previous
Next: Singular Value Decomposition Up: Algorithms for Principal Component Previous: Algorithms for Principal Component
Tapani Raiko 2007-09-11