next up previous contents
Next: Addition Up: Gaussian Variables Previous: Update Rule

First Example

The purpose of this first example is to illustrate what is meant with the propagation of expected values and derivatives. The model structure is shown in Figure [*]. The scalar observed data x(t) is a Gaussian variable with prior mean m and prior variance $\exp(-v)$:

\begin{displaymath}p(x(t)\mid m)=\operatorname{N}\left(x(t);m,\exp(-v)\right).
\end{displaymath} (4.10)

The only two parameters in the model are latent Gaussian variables m and v:
  
$\displaystyle p(m)=\operatorname{N}\left(m;0,\exp(-(-5))\right)$     (4.11)
$\displaystyle p(v)=\operatorname{N}\left(v;0,\exp(-(-5))\right)$     (4.12)

with fixed prior means and variances.


  
Figure: Left: A very simple example structure: There are two Gaussian variables m and v that generate the distribution of the data x(t). Right: The same structure is visualised using x(t) to represent all observations $t=1,2,\dots,T$.
\begin{figure}
\begin{center}
\begin{tabular}{cc}
\epsfig{file=pics/1st_examp...
...1st_example2.eps,width=0.33\textwidth} \end{tabular} \end{center}
\end{figure}

The model basically states that the data points are scattered with an unknown mean and variance. They can be estimated from the data. An estimate of the posterior distributions $p(m\mid \boldsymbol{X},\mathcal{H})$ and $p(v\mid \boldsymbol{X},\mathcal{H})$ are

$\displaystyle q(m)=\operatorname{N}\left(m;\overline{m},\widetilde{m}\right)$     (4.13)
$\displaystyle q(v)=\operatorname{N}\left(v;\overline{v},\widetilde{v}\right),$     (4.14)

where $\overline{m}$, $\widetilde{m}$, $\overline{v}$ and $\widetilde{v}$ are the parameters controlling the posterior approximation.

The expected value and the variance are required from the prior mean m of x(t). The expected value and the expected exponential are required from the variable v that determines the prior variance of x(t). The outputs of Gaussian variables can provide all of these expected values.

The learning would start with some initial values, say $\overline{m}=0$, $\widetilde{m}=1$, $\overline{v}=0$ and $\widetilde{v}=1$. The cost concerning the observations x(t) from ([*]) is

 \begin{displaymath}C_x = \sum_{t=1}^T \frac{1}{2}\left\{ \left< \exp v \right>
...
...^{2}+\widetilde{m}
\right] -
\overline{v} + \ln 2\pi\right\}
\end{displaymath} (4.15)

and its partial derivatives with respect to m
$\displaystyle \frac{\partial C_x}{\partial \left< m \right>}$ = $\displaystyle \left< \exp v \right> \sum_{t=1}^T \left[\overline{m}-x(t)\right]$ (4.16)
$\displaystyle \frac{\partial C_x}{\partial \mathrm{Var}\left\{m\right\}}$ = $\displaystyle \frac{T\left< \exp v \right>}{2}$ (4.17)

are propagated upwards to the variable node m. There the Cx is assumed to be of the form $C_x = a \left< m \right> + b [(\left< m \right>-
\left< m \right>_{\text{current}})^2 + \mathrm{Var}\left\{m\right\}] + c\left< \exp m \right> + d$, which indeed is true.

The part of the cost function that concerns the variable m directly has two parts simplified from Equations ([*]) and ([*])

  
Cm,p = $\displaystyle \frac{1}{2}\left[ \left< \exp (-5) \right>
\left(\overline{m}^{2}
+\widetilde{m} \right) -
\left< -5 \right> + \ln 2\pi\right]$ (4.18)
Cm,q = $\displaystyle -\frac{1}{2} \ln 2\pi e \widetilde{m}.$ (4.19)

The parameters $\overline{m}$ and $\widetilde{m}$ can now be updated so that the total affected cost Cx+Cm,p+Cm,q is minimised. The minimum can be obtained analytically in this case and it is
$\displaystyle \overline{m}$ = $\displaystyle \frac{\sum_{t=1}^T x(t)}{T+\exp(-5)/\left< \exp v \right>}$ (4.20)
$\displaystyle \widetilde{m}$ = $\displaystyle \frac{1}{T\left< \exp v \right>+\exp(-5)}.$ (4.21)

From the result we can see that the mean of m is close to the mean of the observations. The prior in ([*]) pulls it slightly towards zero. The uncertainty of the prior mean $\widetilde{m}$does not directly depend on the actual data. When the number of observations T increases, the uncertainty i.e. the posterior variance $\widetilde{m}$ decreases towards zero.

The partial derivatives of Cx defined in [*] with respect to v are

$\displaystyle \frac{\partial C_x}{\partial \left< v \right>}$ = $\displaystyle \sum_{t=1}^T \frac{1}{2}
\left[\left(x(t)-\overline{m}\right)^{2}+\widetilde{m}
\right]$ (4.22)
$\displaystyle \frac{\partial C_x}{\partial \left< \exp v \right>}$ = $\displaystyle -\frac{T}{2} .$ (4.23)

There are also the terms similar to ([*]) and ([*]). Therefore, when optimising $\overline{v}$ and $\widetilde{v}$, one has to minimise a more complicated function of $\overline{v}$ and $\widetilde{v}$. It must be done iteratively.

The optimal solution for q(m) depends on q(v) and vice versa. This means that to fully solve the problem, one could update q(m) and q(v) alternately.


next up previous contents
Next: Addition Up: Gaussian Variables Previous: Update Rule
Tapani Raiko
2001-12-10