next up previous
Next: Implementation Up: Experiments Previous: Cart-Pole Swing-Up Task

Simulation

All simulations were ran with both low ( $ \sigma=0.001$) and high ( $ \sigma=0.1$) level of Gaussian additive observation noise. Gaussian process noise with $ \sigma=0.001$ was used in all the simulations and the training data set. For the NMPC and OIC methods the length of the control horizon was set to 40 time steps corresponding to 2 seconds of system's real time. The simulations were run for 60 time steps corresponding to 3 seconds of real time to ensure that the controller was able to stabilise the pole. To study the benefits of using a hidden state-space in modelling the dynamics of an unknown system, a comparison model was built which used identity mapping $ \mathbf{I}$ instead of an MLP $ \mathbf{f}$ for the observation mapping. In practice this means replacing (1) with

$\displaystyle \mathbf{x}(t) = \mathbf{s}(t) + \mathbf{n}(t).$ (10)

Also, a modified version of the problem was considered, where only two observations, the location of the cart $ y$ and the angle of the pole $ \phi$, were available.

Tapani Raiko 2005-05-23