At the beginning of the learning for a new data set, the posterior
means of the network weights are initialised to random values and the
variances to small constant values. The original data is augmented
with delay coordinate embedding, which was presented in
Section 2.1.4, so that it consists of multiple
time-shifted copies. The hidden states are initialised with a
*principal component* (PCA) [27] projection of the
augmented data. It is also used in training at the beginning of the
learning.

The learning procedure of the NSSM consists of sweeps. During one sweep, all the parameters of the model are updated as outlined above. There are, however, different phases in learning so that not all the parameters are updated at the very beginning. These phases are summarised in Table 6.1.

Antti Honkela 2001-05-30