1. The VQ model and the cost function

Next: 1. Minimization of the Up: 3. : algorithm for Previous: 3. : algorithm for

1. The VQ model and the cost function

To demonstrate the approach, we will describe ${\rm IVGA_{VQ}}$ where each variable group is modeled using a different VQ. From now on, we will refer to the cost function of a single variable group leaving out the group index g.

The vector quantization model used for a single variable group consists of codebook vectors ${\boldsymbol \mu}(i), \; i = 1, \ldots, C$ (described by their means and a common variance) and indices of the winners w(t) for each data vector ${\boldsymbol x}(t), \; t = 1, \ldots, N$ .

For finding $-\ln p({\boldsymbol x}\vert H)$ we use variational EM-algorithm with $\theta=\{{\boldsymbol \mu},w\}$ as missing observations. In the E-phase, an upper bound of the cost is minimized [2,5]. The rest of the parameters are included in H, i.e. we use ML estimates for the following: c, the hyper parameter governing the prior probability for a codebook vector to be a winner; ${\boldsymbol \sigma}^2_{\boldsymbol x}$ , the diagonal elements of the common covariance matrix; and ${\boldsymbol \mu_\mu}$ and ${\boldsymbol \sigma}^2_{\boldsymbol \mu}$ , the hyper parameters governing the mean and variance of the prior distribution of the codebook vectors.

The upper bound for $-\ln p({\boldsymbol x}\vert H) = -\ln \int p({\boldsymbol x}, \theta \vert H) d\theta$ is obtained from

$\begin{displaymath}\textstyle -\ln \int p({\boldsymbol x}, \theta \vert H) d\th... ...rac{p({\boldsymbol x}, \theta\vert H)}{q(\theta)} d\theta \, , \end{displaymath}$

where the inequality follows from the convexity of $-\ln x$ by Jensen's inequality. H and $q(\theta)$ are alternately optimized but instead of finding the intractable global optimum $q(\theta) = p(\theta \vert {\boldsymbol x}, H)$ , we restrict to finding the optimum among the tractable family of distributions of the form $q({\boldsymbol \mu},w) = q({\boldsymbol \mu})q(w)$ . Substituting $p({\boldsymbol x}, \theta \vert H) = p({\boldsymbol x}\vert w, {\boldsymbol \mu... ...mbol \mu}\vert {\boldsymbol \mu_\mu}, {\boldsymbol \sigma}^2_{\boldsymbol \mu})$ , denoting the integral over $q(\theta)$ by $E\{\}$ and arranging the terms then yields

$\begin{eqnarray*}\textstyle {\rm errors} \;\;\;\;\;\;\;\;\;\;& \;{\rm winners} ... ...mu_\mu}, {\boldsymbol \sigma}^2_{\boldsymbol \mu})} \right\} } . \end{eqnarray*}$

After substituting with the gaussian model for the codebook vectors the cost function becomes

$\begin{eqnarray*} L & = & \sum_{j=1}^d \sum_{t=1}^N \frac{\tilde{\mu}_j(w(t)) + ... ...{\mu}_j(i) - {\mu_\mu}_j(i))^2}{2 {\sigma^2_\mu}_j(i)} \right]. \end{eqnarray*}$

where $\bar{\boldsymbol \mu}$ and $\tilde{\boldsymbol \mu}$ are the mean and variance of the posterior distribution of the codebook vectors, and d is the dimensionality of the subspace, i.e. the size of the variable group.

1. Minimization of the cost function.

Next: 1. Minimization of the Up: 3. : algorithm for Previous: 3. : algorithm for

Krista Lagus
2001-08-28