Discrete variables

Next: Switch Up: Building Blocks Previous: Gaussian variable with nonlinearity

Discrete variables

The prior probabilities of discrete variables with possible values can be assigned from continuous valued signals $c_{i}$ using soft-max prior:

$\displaystyle p(k=i \mid \mathbf{c}) = \frac{\exp c_{i}}{\sum_{j=1}^{n}\exp c_{j}} .$

(10)

The term $\left< -\ln p(k \mid \mathbf{c}) \right>$ of the cost function cannot be computed exactly but it can be approximated from above by using

$\displaystyle \left< -\ln p(k \mid \mathbf{c}) \right> = \left< -c_{k}+\ln\sum_{j=1}^{n} \exp c_{j} \right>$			(11)
$\displaystyle \leq - \left< c_{k} \right>+\ln\sum_{j=1}^{n}\left< \exp c_{j} \right> = C_{k,p} \, ,$			(12)

which follows from the Jensens' inequality assuming all the inputs independent. Note that the terms $\left< \exp c_{j} \right>$ appear inside the concave logarithmic function. A linear approximation based on the derivative w.r.t. $\left< \exp c_{j} \right>$ therefore yields an upper bound for the cost function.

For latent discrete variables there are no restrictions to the posterior approximation . The term $C_{k,q}$ in the cost function arising from $\left< \ln q(k) \right>$ is simply the negative entropy of .

The update of is analogous to Gaussian variables: the gradient of w.r.t. the vector with $1 \leq i \leq n$ is assumed to arise from a linear term $\sum_i q(k = i) C_p(k = i)$ , where denotes the value of assuming that . The linearity assumption holds exactly if the value of the discrete node propagates only to Gaussian variables (through switches) and corresponds to an upper bound of the cost function if the values are used by other discrete variables with soft-max prior. It can be shown that at the minimum of the cost function it holds $q(k = i) \propto \exp(-C_p(k = i))$ .

Next: Switch Up: Building Blocks Previous: Gaussian variable with nonlinearity

Harri Valpola 2001-10-01