During the training phase, data with
2500 samples was used.
Most of the training data consisted of a sequence generated with
semi-random control where the only goal was to ensure that the cart
does not crash into the boundaries. Training data also contained some
examples of hand-generated sections to better model the whole range of
the observation and the dynamic mapping. The model was trained for
10000 iterations, which translates to several hours of computation
time. Six-dimensional state space
was used because
it resulted in a model with the lowest cost function (Eq. 5).
The state
was estimated using the iterated extended
Kalman smoother Anderson79. A history of five observations and
control signals
seemed to suffice to give a reliable estimate. The reference signal
was
and
at the end of the horizon
and for five observations beyond that.
To take care of the constraints in the system with NMPC, a slightly modified version of the cost function (7) was used. Out-of-bounds values of the location of the cart and the force incurred a quadratic penalty, and the full cost function is