next up previous
Next: Free Form Q Up: Examples Previous: Examples

Fixed form Q

Let us model a set of observations, $\vec{x} = x(1), \ldots, x(t)$, by a Gaussian distribution parametrised by mean m and log-std $v = \ln \sigma$. We shall approximate the posterior distribution by q(m,v) = q(m)q(v), where both q(m) and q(v) are Gaussian. The parametrisation with log-std is chosen because the posterior of v is closer to Gaussian than the posterior of $\sigma$ or $\sigma^2$ would be. (Notice that the parametrisation yielding close to Gaussian posterior distributions is connected to uninformative priors discussed in section 5.1.)

Let the priors for m and v be Gaussian with means $\mu_m$ and $\mu_v$ and variances $\sigma_m^2$ and $\sigma_v^2$, respectively. The joint density of the observations and the parameters m and vis

\begin{displaymath}p(\vec{x}, m, v) = \left[ \prod_t p(x(t) \vert m, v) \right] p(m) p(v) \, .
\end{displaymath} (19)

As we can see, the posterior is a product of many simple terms.

Let us denote by $\bar{m}$ and $\tilde{m}$ the posterior mean and variance of m.

\begin{displaymath}q(m; \bar{m}, \tilde{m}) = \frac{1}{\sqrt{2\pi \tilde{m}}}
\end{displaymath} (20)

The distribution $q(v; \bar{v}, \tilde{v})$ is analogous.

The cost function is now

$\displaystyle C_{m, v}(\vec{x})$ = $\displaystyle \int q(m, v) \ln \frac{q(m, v)} {p(\vec{x}, m,
v)} dm dv =$  
    $\displaystyle \int q(m, v) \ln q(m) dm dv + \int
q(m, v) \ln q(v) dm dv +$  
    $\displaystyle - \sum_{t=1}^N \int q(m,
v) \ln p(x(t) \vert m, v) dm dv +$  
    $\displaystyle -\int q(m, v) \ln
p(m) dm dv - \int q(m, v) \ln p(v) dm dv \, .$ (21)

We see that the cost function has many terms, all of which are expectations over q(m, v). Since the approximation q(m, v) is assumed to be factorised into q(m, v) = q(m) q(v), it is fairly easy to compute these expectations. For instance, integrating the term $
\int q(m, v) \ln q(m) dm dv$ over v yields $\int q(m) \ln q(m) dm$, since $\ln q(m)$ does not depend on v and. Since q(m) is assumed to be Gaussian with mean $\bar{m}$ and variance $\tilde{m}$, this integral is, in fact, minus the entropy of a Gaussian distribution and we have

\begin{displaymath}\int q(m) \ln q(m) dm = -\frac{1}{2}(1 + \ln 2\pi \tilde{m}) \, .
\end{displaymath} (22)

A similar term, with $\tilde{m}$ replaced by $\tilde{v}$, comes from $\int q(m, v) \ln q(v) dm dv$.

The terms where expectation is taken over $-\ln p(m)$ and $-\ln p(v)$are also simple since

 \begin{displaymath}-\ln p(m) = \frac{1}{2} \ln 2\pi \sigma_m^2 + \frac{(m -
\mu_m)^2}{2\sigma_2} \, ,
\end{displaymath} (23)

which means that we only need to be able to compute the expectation of $(m - \mu_m)^2$ over the Gaussian q(m) having mean $\bar{m}$ and variance $\tilde{m}$. This yields
$\displaystyle E\{(m-\mu_m)^2\}$ = $\displaystyle E\{m^2\} - 2 E\{m \mu_m\} + E\{\mu_m^2\} =$  
    $\displaystyle \bar{m}^2 + \tilde{m} - 2 \bar{m} \mu_m + \mu_m^2 =
(\bar{m} - \mu_m)^2 + \tilde{m}$ (24)

since the variance can be defined by $\tilde{m} = E\{m^2\} - E\{m\}^2
= E\{m^2\} - \bar{m}^2$ which shows that $E\{m^2\} = \bar{m}^2 +
\tilde{m}$. Integrating the equation 23 and substituting equation 24 thus yields

\begin{displaymath}-\int q(m) \ln p(m) dm = \frac{1}{2} \ln 2\pi \sigma_m^2 +
\frac{(\bar{m} - \mu_m)^2 + \tilde{m}}{2\sigma_2} \, .
\end{displaymath} (25)

A similar term, with m replaced by v, will be obtained from $-\int
q(v) \ln p(v) dv$.

The last terms are of the form $-\int q(m, v) \ln p(x(t) \vert m, v) dm
dv$. Again we will find out that the factorisation q(m, v) = q(m) q(v) simplifies the computation of these terms. Recall that x(t)was assumed to be Gaussian with mean m and variance e2v. The term over which the expectation is taken is thus

 \begin{displaymath}-\ln p(x(t) \vert m, v) = \frac{1}{2} \ln 2\pi + v + (x(t) - m)^2
e^{-2v} \, .
\end{displaymath} (26)

The expectation over the term (x(t) - m)2 e-2v is easy to compute since m and v are assumed to be posteriorly independent. This means that it is possible to take the expectation separately from (x(t) - m)2 and e-2v. Using a similar derivation as in equation 24 yields $(x(t) - \bar{m})^2 + \tilde{m}$ for the first term. The expectation over e-2v is also fairly easy to compute:
$\displaystyle \int q(v) e^{-2v} dv$ = $\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int
e^{-\frac{(v - \bar{v})^2} {2...
...pi\tilde{v}}} \int e^{-\frac{(v - \bar{v})^2 +
4v\tilde{v}} {2 \tilde{v}}} dv =$  
    $\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{v^2 -2v \bar{v} +
\bar{v}^2 + 4v\tilde{v}} {2 \tilde{v}}} dv =$  
    $\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{[v + (2\tilde{v} -
\bar{v})]^2 + 4\bar{v}\tilde{v} - 4\tilde{v}^2} {2 \tilde{v}}} =$  
    $\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{[v +
(2\tilde{v} - ...
...2} {2 \tilde{v}}} e^{2\tilde{v} -
2\bar{v}} dv = e^{2\tilde{v} - 2\bar{v}} \, .$ (27)

This shows that taking expectation over equation 26 yields a term

\begin{displaymath}-\int q(m, v) \ln p(x(t) \vert m, v) dm dv = \frac{1}{2} \ln ...
...[(x(t) - \bar{m})^2 + \tilde{m}]e^{2\tilde{v} - 2\bar{v}} \, .
\end{displaymath} (28)

Collecting together all the terms, we obtain the following cost function
$\displaystyle C_{m,v}(\vec{x}; \bar{m}, \tilde{m}, \bar{v}, \tilde{v})$ = $\displaystyle \sum_{t=1}^N
\frac{1}{2}[(x(t) - \bar{m})^2 + \tilde{m}] e^{2\tilde{v}-2\bar{v}} +
N \bar{v} +$  
    $\displaystyle \frac{(\bar{m} - \mu_m)^2 + \tilde{m}}{2\sigma_m^2} + \frac{(\bar{v}
- \mu_v)^2 + \tilde{v}} {2\sigma_v^2} + \ln \sigma_m \sigma_v +$  
    $\displaystyle \frac{N}{2} \ln 2\pi - \frac{1}{2} \ln \tilde{m}\tilde{v} - 1$ (29)

Assuming $\sigma_m^2$ and $\sigma_v^2$ are very large, the minimum of the cost function can be solved by setting the gradient of the cost function C to zero. This yields the following:

$\displaystyle \bar{m}$ = $\displaystyle \frac{1}{N} \sum_t x(t)$ (30)
$\displaystyle \tilde{m}$ = $\displaystyle \frac{\sum_t (x(t) - \bar{m})^2}{N(N-1)}$ (31)
$\displaystyle \bar{v}$ = $\displaystyle \frac{1}{2N} +
\frac{1}{2} \ln \frac{1}{N-1}\sum_t (x(t) - \bar{m})^2$ (32)
$\displaystyle \tilde{v}$ = $\displaystyle \frac{1}{2N}$ (33)

In case $\sigma_m^2$ and $\sigma_v^2$ cannot be assumed very large, the equations for $\bar{v}$ and $\tilde{v}$ are not that simple, but the solution can still be obtained by solving the zero-point of the gradient.

Figure: Comparison of the true and approximate posterior distributions for a test set containing 100 data points drawn from a model with m=1 and $\sigma =0.1$. The plot on the left shows the true posterior distribution over m and v. The plot on the right shows the approximate posterior distribution consisting of a diagonal Gaussian distribution.

Figure 3 shows a comparison of the true posterior distribution and the approximate posterior. The data set consisted of 100 points drawn from a model with m=1 and $\sigma =0.1$. The contours in both distributions are centred in the same region, a model that underestimates m.The contours for the two distributions are qualitatively similar although the true distribution is not symmetrical about the mean value of v.

next up previous
Next: Free Form Q Up: Examples Previous: Examples
Harri Lappalainen