Models are simplifications of reality which have the purpose of representing relevant aspects of a system under consideration whilst discarding minor details in order to reduce computational load. Models can be used for making predictions and inferences which are required for decision making and choosing actions.
In some cases there is well established theory which can be used for constructing the model. However, for applications such as speech recognition, computer vision and autonomous robotics, the theoretical knowledge is either incomplete or produces models which are too complex and detailed to be of any practical use. In such applications, machine learning has proven to be a successful approach to model building. This means that the learning system is given a flexible set of possible models from which it selects the ones that seem to explain the observations.
An active research topic in machine learning is the development of model structures which are rich enough to represent the relevant aspects of the observations but at the same time allow efficient learning and inference. This is also the topic of the present thesis.
Theoretically the richest model would be the universal Turing machine, which can represent anything that is programmable with computers [41]. The problem with this model is that the space of programmes has too complex a structure to aid in finding good representations for given observations. Therefore the use of this model is restricted to very simple problems where the richness of the model cannot be utilised.
At the other end of the spectrum are structurally very limited but computationally powerful models. The linear factor analysis model is a typical example. These models can be applied to vast data sets but the problem is that most of the interesting structure will remain hidden because the model has no means of representing it.
Artificial neural networks are models which are structurally very simple because they consist of very simple building blocks, neurons. The term originates from models which were inspired by certain structures observed in the brain but in the current meaning the neural network models may not have anything to do with the biological brain. The linear factor analysis model, for instance, can be viewed as a very simple neural network. The idea is that from very elementary, computationally efficient building blocks it is possible to build models which have a rich representational capacity. The hope is that the resulting model will also be computationally efficient. The research is driven by the knowledge that the brain seems to have solved the problem: we are able to find regularities in our environment and learn abstractions which capture the structure of the regularities and allow us to predict future observations and plan our actions.
The goal of unsupervised learning is to extract an efficient representation of the statistical structure implicit in the observations [43]. Factor analysis is one example of unsupervised learning of latent variable models [25]. The observations are modelled as having been generated by unknown latent variables via an unknown mapping. Typically these models are learned by alternating between estimating the latent variables assuming that the model is fixed and estimating the model assuming that the latent variables are fixed.
The linear factor analysis model states that the observations are generated by a linear but otherwise unknown mapping from continuous valued latent variables, factors. The linearity assumption is restrictive and unrealistic in many cases and therefore several attempts have been made to relax the assumption. This extension has turned out to be very difficult and most existing methods are able to handle only models where the number of factors is quite low.