Fixed form Q

Next: Free Form Q Up: Examples Previous: Examples

Fixed form Q

Let us model a set of observations, $\vec{x} = x(1), \ldots, x(t)$ , by a Gaussian distribution parametrised by mean m and log-std $v = \ln \sigma$ . We shall approximate the posterior distribution by q(m,v) = q(m)q(v), where both q(m) and q(v) are Gaussian. The parametrisation with log-std is chosen because the posterior of v is closer to Gaussian than the posterior of $\sigma$ or $\sigma^2$ would be. (Notice that the parametrisation yielding close to Gaussian posterior distributions is connected to uninformative priors discussed in section 5.1.)

Let the priors for m and v be Gaussian with means $\mu_m$ and $\mu_v$ and variances $\sigma_m^2$ and $\sigma_v^2$ , respectively. The joint density of the observations and the parameters m and vis

$\begin{displaymath}p(\vec{x}, m, v) = \left[ \prod_t p(x(t) \vert m, v) \right] p(m) p(v) \, . \end{displaymath}$

(19)

As we can see, the posterior is a product of many simple terms.

Let us denote by $\bar{m}$ and $\tilde{m}$ the posterior mean and variance of m.

$\begin{displaymath}q(m; \bar{m}, \tilde{m}) = \frac{1}{\sqrt{2\pi \tilde{m}}} e^{-\frac{(m-\bar{m})^2}{2\tilde{m}}} \end{displaymath}$

(20)

The distribution $q(v; \bar{v}, \tilde{v})$ is analogous.

The cost function is now

$\displaystyle C_{m, v}(\vec{x})$	=	$\displaystyle \int q(m, v) \ln \frac{q(m, v)} {p(\vec{x}, m, v)} dm dv =$
		$\displaystyle \int q(m, v) \ln q(m) dm dv + \int q(m, v) \ln q(v) dm dv +$
		$\displaystyle - \sum_{t=1}^N \int q(m, v) \ln p(x(t) \vert m, v) dm dv +$
		$\displaystyle -\int q(m, v) \ln p(m) dm dv - \int q(m, v) \ln p(v) dm dv \, .$	(21)

We see that the cost function has many terms, all of which are expectations over q(m, v). Since the approximation q(m, v) is assumed to be factorised into q(m, v) = q(m) q(v), it is fairly easy to compute these expectations. For instance, integrating the term $\int q(m, v) \ln q(m) dm dv$ over v yields $\int q(m) \ln q(m) dm$ , since $\ln q(m)$ does not depend on v and. Since q(m) is assumed to be Gaussian with mean $\bar{m}$ and variance $\tilde{m}$ , this integral is, in fact, minus the entropy of a Gaussian distribution and we have

$\begin{displaymath}\int q(m) \ln q(m) dm = -\frac{1}{2}(1 + \ln 2\pi \tilde{m}) \, . \end{displaymath}$

(22)

A similar term, with $\tilde{m}$ replaced by $\tilde{v}$ , comes from $\int q(m, v) \ln q(v) dm dv$ .

The terms where expectation is taken over $-\ln p(m)$ and $-\ln p(v)$ are also simple since

$\begin{displaymath}-\ln p(m) = \frac{1}{2} \ln 2\pi \sigma_m^2 + \frac{(m - \mu_m)^2}{2\sigma_2} \, , \end{displaymath}$

(23)

which means that we only need to be able to compute the expectation of $(m - \mu_m)^2$ over the Gaussian q(m) having mean $\bar{m}$ and variance $\tilde{m}$ . This yields

$\displaystyle E\{(m-\mu_m)^2\}$	=	$\displaystyle E\{m^2\} - 2 E\{m \mu_m\} + E\{\mu_m^2\} =$
		$\displaystyle \bar{m}^2 + \tilde{m} - 2 \bar{m} \mu_m + \mu_m^2 = (\bar{m} - \mu_m)^2 + \tilde{m}$	(24)

since the variance can be defined by $\tilde{m} = E\{m^2\} - E\{m\}^2 = E\{m^2\} - \bar{m}^2$ which shows that $E\{m^2\} = \bar{m}^2 + \tilde{m}$ . Integrating the equation 23 and substituting equation 24 thus yields

$\begin{displaymath}-\int q(m) \ln p(m) dm = \frac{1}{2} \ln 2\pi \sigma_m^2 + \frac{(\bar{m} - \mu_m)^2 + \tilde{m}}{2\sigma_2} \, . \end{displaymath}$

(25)

A similar term, with m replaced by v, will be obtained from $-\int q(v) \ln p(v) dv$ .

The last terms are of the form $-\int q(m, v) \ln p(x(t) \vert m, v) dm dv$ . Again we will find out that the factorisation q(m, v) = q(m) q(v) simplifies the computation of these terms. Recall that x(t)was assumed to be Gaussian with mean m and variance e^2v. The term over which the expectation is taken is thus

$\begin{displaymath}-\ln p(x(t) \vert m, v) = \frac{1}{2} \ln 2\pi + v + (x(t) - m)^2 e^{-2v} \, . \end{displaymath}$

(26)

The expectation over the term (x(t) - m)² e^-2v is easy to compute since m and v are assumed to be posteriorly independent. This means that it is possible to take the expectation separately from (x(t) - m)² and e^-2v. Using a similar derivation as in equation 24 yields $(x(t) - \bar{m})^2 + \tilde{m}$ for the first term. The expectation over e^-2v is also fairly easy to compute:

$\displaystyle \int q(v) e^{-2v} dv$	=	$\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{(v - \bar{v})^2} {2... ...pi\tilde{v}}} \int e^{-\frac{(v - \bar{v})^2 + 4v\tilde{v}} {2 \tilde{v}}} dv =$
		$\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{v^2 -2v \bar{v} + \bar{v}^2 + 4v\tilde{v}} {2 \tilde{v}}} dv =$
		$\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{[v + (2\tilde{v} - \bar{v})]^2 + 4\bar{v}\tilde{v} - 4\tilde{v}^2} {2 \tilde{v}}} =$
		$\displaystyle \frac{1}{\sqrt{2\pi\tilde{v}}} \int e^{-\frac{[v + (2\tilde{v} - ... ...2} {2 \tilde{v}}} e^{2\tilde{v} - 2\bar{v}} dv = e^{2\tilde{v} - 2\bar{v}} \, .$	(27)

This shows that taking expectation over equation 26 yields a term

$\begin{displaymath}-\int q(m, v) \ln p(x(t) \vert m, v) dm dv = \frac{1}{2} \ln ... ...[(x(t) - \bar{m})^2 + \tilde{m}]e^{2\tilde{v} - 2\bar{v}} \, . \end{displaymath}$

(28)

Collecting together all the terms, we obtain the following cost function

$\displaystyle C_{m,v}(\vec{x}; \bar{m}, \tilde{m}, \bar{v}, \tilde{v})$	=	$\displaystyle \sum_{t=1}^N \frac{1}{2}[(x(t) - \bar{m})^2 + \tilde{m}] e^{2\tilde{v}-2\bar{v}} + N \bar{v} +$
		$\displaystyle \frac{(\bar{m} - \mu_m)^2 + \tilde{m}}{2\sigma_m^2} + \frac{(\bar{v} - \mu_v)^2 + \tilde{v}} {2\sigma_v^2} + \ln \sigma_m \sigma_v +$
		$\displaystyle \frac{N}{2} \ln 2\pi - \frac{1}{2} \ln \tilde{m}\tilde{v} - 1$	(29)

Assuming $\sigma_m^2$ and $\sigma_v^2$ are very large, the minimum of the cost function can be solved by setting the gradient of the cost function C to zero. This yields the following:

$\displaystyle \bar{m}$	=	$\displaystyle \frac{1}{N} \sum_t x(t)$	(30)
$\displaystyle \tilde{m}$	=	$\displaystyle \frac{\sum_t (x(t) - \bar{m})^2}{N(N-1)}$	(31)
$\displaystyle \bar{v}$	=	$\displaystyle \frac{1}{2N} + \frac{1}{2} \ln \frac{1}{N-1}\sum_t (x(t) - \bar{m})^2$	(32)
$\displaystyle \tilde{v}$	=	$\displaystyle \frac{1}{2N}$	(33)

In case $\sigma_m^2$ and $\sigma_v^2$ cannot be assumed very large, the equations for $\bar{v}$ and $\tilde{v}$ are not that simple, but the solution can still be obtained by solving the zero-point of the gradient.

**Figure:** Comparison of the true and approximate posterior distributions for a test set containing 100 data points drawn from a model with m=1 and $\sigma =0.1$ . The plot on the left shows the true posterior distribution over m and v. The plot on the right shows the approximate posterior distribution consisting of a diagonal Gaussian distribution.
$\includegraphics[width=10.0cm]{prob_contour1.eps}$

Figure 3 shows a comparison of the true posterior distribution and the approximate posterior. The data set consisted of 100 points drawn from a model with m=1 and $\sigma =0.1$ . The contours in both distributions are centred in the same region, a model that underestimates m.The contours for the two distributions are qualitatively similar although the true distribution is not symmetrical about the mean value of v.

Next: Free Form Q Up: Examples Previous: Examples

Harri Lappalainen
2000-03-03