Publication I lays the foundation of the thesis. A cost function for multi-layer perceptron (MLP) networks is developed in the minimum description length (MDL) framework. The presentation is given from the point of view of supervised learning as this is the learning method traditionally used with MLP networks. The possibility of unsupervised learning is briefly outlined. It transpired that the cost function has a Bayesian interpretation and essentially the same approach works for ensemble learning, the Bayesian method used in the remainder of the publications.

Publication II demonstrates that it is possible to apply ensemble learning to the unsupervised learning of linear independent component (or factor) analysis. The emphasis is on using full posterior approximations instead of point estimates, as that allows comparing between models and protects against overfitting. The treatment of the posterior distribution of variables which have mixture-of-Gaussians prior distributions is later replaced by the method borrowed from [3], but the treatment of other distributions in later publications is based on the methods presented here.

In publication III, multi-layer perceptrons are used for nonlinear factor analysis following the guidelines presented in publications I and II. The results are encouraging although the model used is rather small. The strategy for growing the network is inspired by the biological brain. Mr. Xavier Giannakopoulos assisted in running the simulations.

Publication IV is a tutorial introduction to ensemble learning and appeared in the same volume as publication V. It is co-authored by Mr. James Miskin, who also had another paper using ensemble learning in the same volume. The credit of writing the part considering free form approximation in ensemble learning should go to James Miskin. The present author was responsible for the rest of the text: fixed form approximation, model selection and the relation to coding and the EM algorithm.

Most of the results concerning nonlinear factor analysis by an MLP network are contained in publication V. The article also includes a detailed account on the learning scheme used in the simulations. It turned out that the simple approximation developed in publication I and applied in publication III does not suffice for unsupervised learning of MLP networks. A more accurate and laborious approximation is used in publication V. The article also outlines nonlinear independent factor analysis which results as the combination of a nonlinear mapping and a non-Gaussian model for factors. Mr. Antti Honkela assisted in running the simulations.

Publication VI presents a more accurate and detailed derivation for nonlinear independent factor analysis than that provided in publication V.

The basic idea on which publication VII is based was first published in the technical report [72] which provides a new interpretation for the FastICA algorithm. This interpretation suggests various extensions of which a fast Bayesian independent component analysis algorithm utilising ensemble learning is given as an example. Dr. Petteri Pajunen contributed to writing and clarifying the style of presentation.

Publication VIII outlines a dynamic extension of the nonlinear independent factor analysis algorithm. The dynamics of the factors, or states in this case, are modelled by the same principles as the mapping from factors to observations. The extension is simple but has practical significance since time sequences with significant time structure are often encountered in practical problems.