Factor analysis (Hyvärinen et al., 2001; Harman, 1967; Kendall, 1975) (FA) can be seen as a Bayesian
network consisting of two layers, depicted in Figure 3.2. The top layer contains latent
variables
and the bottom layer contains observations
. The two layers are fully connected, that is, each
observation has all of the latent variables as its parents. The index
stands for the data case.
![]() |
The mapping from factors to data is linear3.2
![]() |
(3.14) |
Equation (3.14) does not fix the matrix
,
since there is a group of rotations that yields identical observation
distributions. Several criteria have been suggested for determining
the rotation. One is parsimony, which roughly means that
most of the values in
are close to zero. Another one leads
to independent component analysis described in Section 3.1.4.
Sections 4.2 and 4.4.2 describe extensions of
factor analysis releasing from the linearity assumption of the
dependency between factors and observations.
Principal component analysis (PCA) (Jolliffe, 1986; Hyvärinen et al., 2001; Kendall, 1975), equivalent to the Hotelling
transform, the Karhunen-Loève transform, and the singular value decomposition, is a widely used method
for finding the most important directions in the data in the
mean-square sense. It is the solution of the FA problem under low
noise (see Bishop, 2006) with orthogonal principal components (the
columns of the weight matrix
).
The first principal component
corresponds to the line on
which the projection of the data has the greatest variance:
![]() |
(3.15) |
![]() |
(3.16) |
There are many other ways to formulate PCA, including probabilistic PCA (Bishop, 1999).
In practice, the principal components are found by calculating the
eigenvectors of the covariance matrix
of the data