During the training phase, data with 2500 samples was used. Most of the training data consisted of a sequence generated with semi-random control where the only goal was to ensure that the cart does not crash into the boundaries. Training data also contained some examples of hand-generated sections to better model the whole range of the observation and the dynamic mapping. The model was trained for 10000 iterations, which translates to several hours of computation time. Six-dimensional state space was used because it resulted in a model with the lowest cost function (Eq. 5).
The state was estimated using the iterated extended Kalman smoother Anderson79. A history of five observations and control signals seemed to suffice to give a reliable estimate. The reference signal was and at the end of the horizon and for five observations beyond that.
To take care of the constraints in the system with NMPC, a slightly modified version of the cost function (7) was used. Out-of-bounds values of the location of the cart and the force incurred a quadratic penalty, and the full cost function is