Solving problems with complex models cannot be done analytically. Instead, one can formulate a learning criterion typically in a form of a cost function. The learning can done by adjusting parameters iteratively such, that the cost function is minimised. Once the criterion is determined, one can use methods of optimisation theory, such as gradient descent.
The first candidate for a cost function could be the
mean square reconstruction error
![]() |
(2.18) |
In the basic ICA, the dimensionality of the model is the same as the dimensionality of the data. The reconstructions are perfect and the learning criterion cannot be based on reconstruction error. Instead, one can maximise the nongaussianity of the components. Also, when the curvature is allowed to be high enough in a nonlinear model, the data can be reconstructed perfectly. That does not guarantee that the model would be meaningful or generalise to new samples. The problem is called overfitting and one needs a better criterion for learning something meaningful.
With simple criteria, the amount of data required for avoiding
overfitting is directly proportional to the number of free parameters
[24]. In supervised learning, increasing the number of
data points will always overcome the problem. In unsupervised
learning, however, the number of unknown variables grows with the
number of data points, since the factors corresponding to each
observation are also unknown. This would suggest that the simple
methods work only, when the number of factors is small enough compared
to the dimensionality of the data. Even that is not true if one tries
to estimate the variance of the noise. This will be demonstrated in
Section . Bayesian inference solves the
problem of overfitting and it is addressed in
Chapter
.