The prior probabilities of discrete variables with possible values can be assigned from continuous valued signals using soft-max prior:
For latent discrete variables there are no restrictions to the posterior approximation . The term in the cost function arising from is simply the negative entropy of .
The update of is analogous to Gaussian variables: the gradient of w.r.t. the vector with is assumed to arise from a linear term , where denotes the value of assuming that . The linearity assumption holds exactly if the value of the discrete node propagates only to Gaussian variables (through switches) and corresponds to an upper bound of the cost function if the values are used by other discrete variables with soft-max prior. It can be shown that at the minimum of the cost function it holds .