Factor analysis (Hyvärinen et al., 2001; Harman, 1967; Kendall, 1975) (FA) can be seen as a Bayesian network consisting of two layers, depicted in Figure 3.2. The top layer contains latent variables and the bottom layer contains observations . The two layers are fully connected, that is, each observation has all of the latent variables as its parents. The index stands for the data case.
|
The mapping from factors to data is linear3.2
(3.14) |
Equation (3.14) does not fix the matrix , since there is a group of rotations that yields identical observation distributions. Several criteria have been suggested for determining the rotation. One is parsimony, which roughly means that most of the values in are close to zero. Another one leads to independent component analysis described in Section 3.1.4. Sections 4.2 and 4.4.2 describe extensions of factor analysis releasing from the linearity assumption of the dependency between factors and observations.
Principal component analysis (PCA) (Jolliffe, 1986; Hyvärinen et al., 2001; Kendall, 1975), equivalent to the Hotelling transform, the Karhunen-Loève transform, and the singular value decomposition, is a widely used method for finding the most important directions in the data in the mean-square sense. It is the solution of the FA problem under low noise (see Bishop, 2006) with orthogonal principal components (the columns of the weight matrix ).
The first principal component corresponds to the line on which the projection of the data has the greatest variance:
(3.15) |
(3.16) |
There are many other ways to formulate PCA, including probabilistic PCA (Bishop, 1999). In practice, the principal components are found by calculating the eigenvectors of the covariance matrix of the data