During the training phase, data with
2500 samples was used.  
Most of the training data consisted of a sequence generated with
semi-random control where the only goal was to ensure that the cart
does not crash into the boundaries. Training data also contained some
examples of hand-generated sections to better model the whole range of
the observation and the dynamic mapping. The model was trained for
10000 iterations, which translates to several hours of computation
time. Six-dimensional state space 
 was used because 
it resulted in a model with the lowest cost function (Eq. 5).
The state 
 was estimated using the iterated extended
Kalman smoother Anderson79.  A history of five observations and 
control signals
seemed to suffice to give a reliable estimate.  The reference signal
 was 
 and 
 at the end of the horizon
and for five observations beyond that.
To take care of the constraints in the system with NMPC, a slightly modified version of the cost function (7) was used. Out-of-bounds values of the location of the cart and the force incurred a quadratic penalty, and the full cost function is