Nonlinear model predictive control (NMPC) Mayne00 is based on minimising a cost function defined over a future window of fixed length . For example, the quadratic difference between the predicted future observations and a reference signal can be used:
Here, the states and observations (but not control signals) are modelled probabilistically so we minimise the expected cost BarShalom81. The current guess defines a probability distribution over future states and observations. This inference can be done with a single forward pass, when ignoring the internal forward model, that is, the dependency of the state on future control signals. In this case, it makes sense to ignore the forward model anyway, since the future control signals do not have to follow the learned policy.
Minimisation of is done with a quasi-Newton algorithm Nocedal99. For that, the partial derivatives for all must be computed. For a single input system we can simply apply the chain rule to arrive at the Jacobian
(8) |
The use of a cost function makes NMPC very versatile. Costs for control signals and observations can be set for instance to restrict values within bounds etc. Expectations over a quadratic cost (Eq. 7) are easy to evaluate because: , where is the dimensionality of the observation space and is the variance over the distribution . The two terms are the nominal and stochastic part of the cost function. There is a direct analogy with dual control Astrom95 which means balancing between good control and small estimation errors. The usefulness of the decomposition is discussed in BarShalom81.