next up previous contents
Next: The likelihood term Up: Evaluating the cost function Previous: Evaluating the cost function   Contents

The terms originating from the parameters $ \boldsymbol {\theta }$

Assuming the parameters are $ \boldsymbol{\theta}= \{ \theta_1, \ldots, \theta_{N}
\}$ and the approximation is of the form

$\displaystyle q(\boldsymbol{\theta}) = \prod_{i=1}^{N} q(\theta_i),$ (6.2)

the terms of Equation (6.1) originating from the parameters $ \boldsymbol {\theta }$ can be written as

$\displaystyle \operatorname{E}\left[ \log q(\boldsymbol{\theta}) - \log p(\bold...
... q(\theta_i) \right] - \operatorname{E}\left[ \log p(\theta_i) \right] \right).$ (6.3)

In the case of Dirichlet distributions one $ \theta_i$ in the previous equation must of course consist of a vector of parameters for the single distribution.

There are two different kinds of parameters in $ \boldsymbol {\theta }$, those with Gaussian distribution and those with a Dirichlet distribution. In the Gaussian case the expectation $ E [ \log q(\theta_i)]$ over $ q(\theta_i) = N(\theta_i;\; \overline{\theta_i}, \widetilde{\theta_i})$ gives the negative entropy of a Gaussian, $ -1/2 ( 1 + \log (2 \pi \widetilde{\theta_i})
)$, as derived in Equation (A.5) of Appendix A.

The expectation of $ - \log p(\theta_i)$ can also be evaluated using the formulas of Appendix A. Assuming

$\displaystyle p(\theta_i) = N(\theta_i;\; m, \exp(2 v))$ (6.4)

where $ q(\theta_i) = \ensuremath{N(\theta_i;\; \overline{\theta_i}, \widetilde{\theta_i})}, q(m) = \ensuremath{N(m;\; \overline{m}, \widetilde{m})}$ and $ q(v) =
\ensuremath{N(v;\; \overline{v}, \widetilde{v})}$, the expectation becomes

\begin{displaymath}\begin{split}C_p(\theta_i) &= \operatorname{E}\left[ - \log p...
...lde{m}\right] \exp(2\widetilde{v} - 2 \overline{v}) \end{split}\end{displaymath} (6.5)

where we have used the results of Equations (A.4) and (A.6).

For Dirichlet distributed parameters, the procedure is similar. Let us assume that the parameter $ \mathbf{c} \in \boldsymbol{\theta}$, $ p(\mathbf{c}) =
\ensuremath{\text{Dirichlet}}(\mathbf{c};\; \mathbf{u}^{(\mathbf{c}}))$ and $ q(\mathbf{c}) =
\ensuremath{\text{Dirichlet}}(\mathbf{c};\; \hat{\mathbf{c}})$. Using the notation of Appendix A, the negative entropy of the Dirichlet distribution $ q(\mathbf{c})$, $ \operatorname{E}\left[ \log q(\mathbf{c})
\right]$, can be evaluated as in Equation (A.14) to yield

$\displaystyle C_q(\mathbf{c}) = \operatorname{E}\left[ \log q(\mathbf{c}) \righ...
...mathbf{c}}) - \sum_{i=1}^n (\hat{c}_i - 1) [\Psi(\hat{c}_i) - \Psi(\hat{c}_0)].$ (6.6)

The special function required in these terms is $ \Psi(x) =
\frac{d}{dx} \ln(\Gamma(x))$, where $ \Gamma(x)$ is the gamma function. The psi function $ \Psi(x)$ is also known as the digamma function and it can be efficiently evaluated numerically for example using techniques described in [4]. The term $ Z(\hat{\mathbf{c}})$ is a normalising constant of the Dirichlet distribution as defined in Appendix A.

The expectation of $ - \log p(\mathbf{c})$ can be evaluated similarly

\begin{displaymath}\begin{split}C_p(\mathbf{c}) &= - \operatorname{E}\left[ \log...
...n (u^{(\mathbf{c})}_i - 1) [\Psi(c_i) - \Psi(c_0)]. \end{split}\end{displaymath} (6.7)


next up previous contents
Next: The likelihood term Up: Evaluating the cost function Previous: Evaluating the cost function   Contents
Antti Honkela 2001-05-30