A substantial amount of research has been conducted on estimating the
states
**s**(*t*) under the assumption that the mappings
**f**and
**g** are known. This is called Kalman
filtering if the mappings are linear and extended
Kalman filtering if the mappings are nonlinear. Several textbooks
give introductions to the field, for instance,
[38,121,88].

Many of the algorithms for unsupervised learning of the generative
mapping
**f** can be extended to learn also the state dynamics
**g**. An obvious way is to first learn a static model with only
**f** and then use supervised learning to learn
**g**.
However, this procedure has the disadvantage that during the learning
of
**f**, dynamic structure is not taken into account and
therefore the resulting representation does not necessarily make
efficient use of the dynamics.

An early application to learning both
**f** and
**g**together can be found in [118]. However, both mappings are
assumed to be linear. A linear generative mapping
**f** and
nonlinear dynamic mapping
**g** have been used in
[16]. Some training samples where the states are
observed are assumed to be available and therefore the method is not
wholly unsupervised.

Fully nonlinear learning algorithms have independently been proposed
in [31,12]. In [31],
Gaussian radial basis functions (RBF) [90] are
used for modelling the mappings
**f** and
**g** and the EM
algorithm [21] is used for learning parameters of the
linear mapping in the RBF model, in other words, the nonlinearities
are not adapted. The structure of the Gaussian RBF model allows
analytical computation of the expectations required for adapting the
linear mapping in the model which makes the approach interesting. The
drawback is that since the nonlinearities are not adapted, the
required number of hidden neurons is exponential in the dimension of
latent space.

MLP networks have been used in [12] to model the mappings
**f** and
**g**. First the posterior density of the states
is approximated and then samples are taken from the posterior and used
to estimate the parameters of the MLP network with the ordinary
backpropagation algorithm [111] (see also, e.g.,
[39,8]). Point estimates for the parameters of
the mappings and Gaussian models for the innovation process were used
in both [31,12].

Publication VIII shows how the nonlinear factor analysis algorithm developed in this thesis can be extended to take into account the dynamics of the factors.