Next: Implementation
Up: Experiments
Previous: Cart-Pole Swing-Up Task
All simulations were ran with both low (
) and high
(
) level of Gaussian additive observation noise. Gaussian
process noise with
was used in all the simulations and
the training data set. For the NMPC and OIC methods the length of the
control horizon was set to 40 time steps corresponding to 2 seconds of
system's real time. The simulations were run for 60 time steps
corresponding to 3 seconds of real time to ensure that the controller
was able to stabilise the pole.
To study the benefits of using a hidden state-space in modelling the
dynamics of an unknown system, a comparison model was built which
used identity mapping
instead of an MLP
for the observation
mapping. In practice this means replacing (1) with
|
(10) |
Also, a modified version of the problem was considered, where only two
observations, the location of the cart and the angle of the pole
, were available.
Tapani Raiko
2005-05-23