A substantial amount of research has been conducted on estimating the states s(t) under the assumption that the mappings fand g are known. This is called Kalman filtering if the mappings are linear and extended Kalman filtering if the mappings are nonlinear. Several textbooks give introductions to the field, for instance, [38,121,88].
Many of the algorithms for unsupervised learning of the generative mapping f can be extended to learn also the state dynamics g. An obvious way is to first learn a static model with only f and then use supervised learning to learn g. However, this procedure has the disadvantage that during the learning of f, dynamic structure is not taken into account and therefore the resulting representation does not necessarily make efficient use of the dynamics.
An early application to learning both f and gtogether can be found in . However, both mappings are assumed to be linear. A linear generative mapping f and nonlinear dynamic mapping g have been used in . Some training samples where the states are observed are assumed to be available and therefore the method is not wholly unsupervised.
Fully nonlinear learning algorithms have independently been proposed in [31,12]. In , Gaussian radial basis functions (RBF)  are used for modelling the mappings f and g and the EM algorithm  is used for learning parameters of the linear mapping in the RBF model, in other words, the nonlinearities are not adapted. The structure of the Gaussian RBF model allows analytical computation of the expectations required for adapting the linear mapping in the model which makes the approach interesting. The drawback is that since the nonlinearities are not adapted, the required number of hidden neurons is exponential in the dimension of latent space.
MLP networks have been used in  to model the mappings f and g. First the posterior density of the states is approximated and then samples are taken from the posterior and used to estimate the parameters of the MLP network with the ordinary backpropagation algorithm  (see also, e.g., [39,8]). Point estimates for the parameters of the mappings and Gaussian models for the innovation process were used in both [31,12].
Publication VIII shows how the nonlinear factor analysis algorithm developed in this thesis can be extended to take into account the dynamics of the factors.