Control

Model predictive control (see Morari and Lee, 1999, for a survey) aims at controlling a dynamical system by using a predictive model. Control inputs $\mathbf{u}(t)$ are added to the nonlinear state-space model. In publication IV this is done by modifying the system dynamics in Equation (4.10) by

$\displaystyle \left[ \begin{array}{c} \mathbf{u}(t) \\ \mathbf{s}(t) \end{array} \right]$

$\displaystyle = \mathbf{g}\left(\left[ \begin{array}{c} \mathbf{u}(t-1) \\ \mat... ...}(t-1) \end{array} \right],\boldsymbol{\theta}_\mathbf{g}\right)+\mathbf{m}(t).$

(4.13)

Publication IV studies three different control schemes in this setting. Direct control is based on using the internal forward model directly by selecting the mean of the probability distribution given by Equation (4.13). Direct control is fast to use, but the learning of the mapping $\mathbf{g}$ is hard to do well.

The second control scheme is nonlinear model-predictive control (see e.g. Eduardo Fernández Camacho, 2004), which is based on optimising control signals based on maximising a utility function. First, an initial guess for the control signals $\mathbf{u}(t)$ is made. The posterior distribution of the future states are inferred. The gradient of total expected utility with respect to states and control signals is propagated backwards in time. Control signals are then updated in the direction of increasing utility. This process is iterated as long as there is time before the next control signal needs to be selected. Nonlinear model-predictive control can be seen as applying decision theory (see Sections 2.4 and 3.2.4).

Optimistic inference control, introduced in Publication IV, is the third studied control scheme. It is based on Bayesian inference answering the question: ``Assuming success in the end, what will happen in near future?'' Control signal is inferred given the history of observations and assuming wanted observations after a gap of missing values. Inference combines the internal forward model with the evidence propagating backwards from the desired future. Optimistic inference control lacks in flexibility and theoretical foundation compared to nonlinear model-predictive control, but it provides a link between two problems: inference and control. It gave the inspiration for the inference algorithm introduced in Publication V. Tornio and Raiko (2006) apply the algorithm back in control. Attias (2003) independently discovered the idea behind optimistic inference control, calling it planning by probabilistic inference. His example, finding a goal in a grid world, is quite different from control, but the underlying idea is still the same.

The proposed control methods were tested with a cart-pole swing-up task in Figure 4.2. Figure 4.3 illustrates the model predictive control in action. The experimental results in Publication IV confirm that selecting actions based on a state-space model instead of the observation directly has many benefits: First, it is more resistant to noise because it implicitly involves filtering. Second, the observations (without history) do not always carry enough information about the system state. Third, when nonlinear dynamics are modelled by a function approximator such as an MLP network, a state-space model can find such a representation of the state that it is more suitable for the approximation and thus more predictable.

**Figure 4.2:** The cart-pole system. The goal is to swing the pole to an upward position and stabilise it without hitting the walls. The cart can be controlled by applying a force to it.
$\begin{figure}\centering\psfrag{F} [cc][cc]{$F$}\psfrag{y} [cc][cc]{$x$}\p... ...[cc][cc]{$\theta$} \epsfig{file=cartpole.eps,width=0.95\linewidth}\end{figure}$

**Figure 4.3:** *Top:* The pole is successfully swung up by moving first to the left and then right. Predictions are plotted with grey. *Bottom:* The hidden states, observations, and the control signal in the same situation. The current time is marked with a vertical dash line. The prediction horizon is 40 steps.
$\includegraphics[width=0.95\textwidth, trim=5mm 5mm 5mm 7mm, clip]{swingup.eps}$

Model development is by far the most critical and time-consuming step in implementing a model predictive controller (Morari and Lee, 1999). The analysis in Publication IV is of course very shallow compared to the huge mass of control literature but there seems to be need for sophisticated model learners (or system identifiers). For instance, Rosenqvist and Karlström (2005) also learn a nonlinear state-space model for control. The nonlinearities are modelled using piecewise linear mappings. Parameters are estimated using the prediction error method, which is equivalent to the maximum likelihood estimation in the Bayesian framework.