next up previous
Next: Direct Control (DC) Up: Learning Nonlinear State-Space Models Previous: Task-Oriented Identification


Control Schemes

So far only passive observation and learning has been considered. Now we come to the question how the control signals (or actions) are selected. That is, given the history of observations $ \dots,\mathbf{x}(t_0-2),\mathbf{x}(t_0-1)$ and control signals $ \dots,\mathbf{u}(t_0-2),\mathbf{u}(t_0-1)$, select a good control signal $ \mathbf{u}(t_0)$ at the current time $ t_0$. Then, a new observation $ \mathbf{x}(t_0)$ is made and time $ t_0$ is increased by one. Three different control schemes and their cooperation are studied below and summarised in Table I.

Subsections

Tapani Raiko 2005-05-23