Next: Main Results
Up: Discussion and Conclusion
Previous: Discussion and Conclusion
When learning from data, the model represents well only those
phenomena that appear in the data. If the data is too uniform, the
model will not become robust. In other words, one should balance
between exploration and exploitation. In this paper, the data sets are
generated partly by hand and all the control schemes aim at
exploitation only. A good starting point for taking exploration into
account is in [16].
For direct control, the model was learned using examples of
control with a single goal in mind. It is straightforward to
generalise this into a situation with a selection of different
goals. The dynamics of the system stays the same regardless of the goal
and only the policy mapping (see Figure 1) needs to
be changed for each goal.
The direct and indirect control methods can be used together. One can
use the data produced by indirect control methods for learning the direct controller.
This can be done even offline, that is, simulating the estimated model
and sampling observations from their predicted distributions. This can
be compared to dreaming.
The enhancement of the task-oriented identification (policy mapping) in
turn helps the indirect methods, too. This idea is comparable to
temporal difference learning [15] where the difference of
temporally successive predictions is used for adjusting the earlier
one. One should be careful, though. If the examples given for learning
are fluent all the time, the robustness of the model might start to
decrease.
When faced with an unknown state, the best thing to do is often first
decrease the uncertainty by for example looking around, and then take
action based on what has been revealed. This is called probing.
Unfortunately the simple
posterior approximation used in this paper does not allow such plans.
The future actions (control signals) need to depend on future states but
unfortunately they are assumed to be independent here. An interesting
continuation is to use another posterior approximation, such as
particle filters, for allowing that.
Next: Main Results
Up: Discussion and Conclusion
Previous: Discussion and Conclusion
Tapani Raiko
2005-05-23