Continuing the learning process with the old model but new data requires initial estimates for the new hidden states. If the new data is a direct continuation of the old, the predictions of the old states provide a reasonable initial estimate for the new ones and the algorithm can continue the adaptation from there.
If the new data forms an entirely separate sequence, the problem is more difficult. Knowing the model, we can still do much better than starting at random or using the same initialisation as in the very beginning.
One way to find the estimates is to use an auxiliary MLP network to
model the inverse of the observation mapping
[59]. This MLP can be trained using
standard supervised back-propagation with the estimated means of
and
as training set. Their roles are of course inverted
so that
are the inputs and
the outputs. The auxiliary
MLP cannot give perfect estimates for the states
, but they can
usually be adapted very quickly by using the standard learning
algorithm to update only the hidden states.