The observed variables of the system are the position of the cart ,
angle of the pole measured from the upward position
, and
their first derivatives
and
. Control input is the
force
applied to the cart. The detailed dynamics and constraints for the
simulated cart-pole system can be found in Kimura99.
A discrete
system was simulated with a time step of
s. The
possible force was constrained between
N and
N,
and the position between
m and
m.
The system was initialised to a
random state taken from the uniform distributions
,
,
,
.