The terms originating from the parameters

Next: The likelihood term Up: Evaluating the cost function Previous: Evaluating the cost function Contents

The terms originating from the parameters $\boldsymbol {\theta }$

Assuming the parameters are $\boldsymbol{\theta}= \{ \theta_1, \ldots, \theta_{N} \}$ and the approximation is of the form

$\displaystyle q(\boldsymbol{\theta}) = \prod_{i=1}^{N} q(\theta_i),$

(6.2)

the terms of Equation (6.1) originating from the parameters $\boldsymbol {\theta }$ can be written as

$\displaystyle \operatorname{E}\left[ \log q(\boldsymbol{\theta}) - \log p(\bold... ... q(\theta_i) \right] - \operatorname{E}\left[ \log p(\theta_i) \right] \right).$

(6.3)

In the case of Dirichlet distributions one $\theta_i$ in the previous equation must of course consist of a vector of parameters for the single distribution.

There are two different kinds of parameters in $\boldsymbol {\theta }$ , those with Gaussian distribution and those with a Dirichlet distribution. In the Gaussian case the expectation $E [ \log q(\theta_i)]$ over $q(\theta_i) = N(\theta_i;\; \overline{\theta_i}, \widetilde{\theta_i})$ gives the negative entropy of a Gaussian, $-1/2 ( 1 + \log (2 \pi \widetilde{\theta_i}) )$ , as derived in Equation (A.5) of Appendix A.

The expectation of $- \log p(\theta_i)$ can also be evaluated using the formulas of Appendix A. Assuming

$\displaystyle p(\theta_i) = N(\theta_i;\; m, \exp(2 v))$

(6.4)

where $q(\theta_i) = \ensuremath{N(\theta_i;\; \overline{\theta_i}, \widetilde{\theta_i})}, q(m) = \ensuremath{N(m;\; \overline{m}, \widetilde{m})}$ and $q(v) = \ensuremath{N(v;\; \overline{v}, \widetilde{v})}$ , the expectation becomes

$\begin{displaymath}\begin{split}C_p(\theta_i) &= \operatorname{E}\left[ - \log p... ...lde{m}\right] \exp(2\widetilde{v} - 2 \overline{v}) \end{split}\end{displaymath}$

(6.5)

where we have used the results of Equations (A.4) and (A.6).

For Dirichlet distributed parameters, the procedure is similar. Let us assume that the parameter $\mathbf{c} \in \boldsymbol{\theta}$ , $p(\mathbf{c}) = \ensuremath{\text{Dirichlet}}(\mathbf{c};\; \mathbf{u}^{(\mathbf{c}}))$ and $q(\mathbf{c}) = \ensuremath{\text{Dirichlet}}(\mathbf{c};\; \hat{\mathbf{c}})$ . Using the notation of Appendix A, the negative entropy of the Dirichlet distribution $q(\mathbf{c})$ , $\operatorname{E}\left[ \log q(\mathbf{c}) \right]$ , can be evaluated as in Equation (A.14) to yield

$\displaystyle C_q(\mathbf{c}) = \operatorname{E}\left[ \log q(\mathbf{c}) \righ... ...mathbf{c}}) - \sum_{i=1}^n (\hat{c}_i - 1) [\Psi(\hat{c}_i) - \Psi(\hat{c}_0)].$

(6.6)

The special function required in these terms is $\Psi(x) = \frac{d}{dx} \ln(\Gamma(x))$ , where $\Gamma(x)$ is the gamma function. The psi function $\Psi(x)$ is also known as the digamma function and it can be efficiently evaluated numerically for example using techniques described in [4]. The term $Z(\hat{\mathbf{c}})$ is a normalising constant of the Dirichlet distribution as defined in Appendix A.

The expectation of $- \log p(\mathbf{c})$ can be evaluated similarly

$\begin{displaymath}\begin{split}C_p(\mathbf{c}) &= - \operatorname{E}\left[ \log... ...n (u^{(\mathbf{c})}_i - 1) [\Psi(c_i) - \Psi(c_0)]. \end{split}\end{displaymath}$

(6.7)

Next: The likelihood term Up: Evaluating the cost function Previous: Evaluating the cost function Contents

Antti Honkela 2001-05-30