The prior probabilities of discrete variables with possible values can be assigned from continuous valued signals using soft-max prior:

(10) |

The term of the cost function cannot be computed exactly but it can be approximated from above by using

(11) | |||

(12) |

which follows from the Jensens' inequality assuming all the inputs independent. Note that the terms appear inside the concave logarithmic function. A linear approximation based on the derivative w.r.t. therefore yields an upper bound for the cost function.

For latent discrete variables there are no restrictions to the posterior approximation . The term in the cost function arising from is simply the negative entropy of .

The update of is analogous to Gaussian variables: the gradient of w.r.t. the vector with is assumed to arise from a linear term , where denotes the value of assuming that . The linearity assumption holds exactly if the value of the discrete node propagates only to Gaussian variables (through switches) and corresponds to an upper bound of the cost function if the values are used by other discrete variables with soft-max prior. It can be shown that at the minimum of the cost function it holds .