A Gaussian variable has two inputs and and prior probability . The variance is parametrised this way because then the mean and expected exponential of suffice for computing the cost function. It can be shown that when , and are mutually independent, i.e. , yields
(2) |
The output of a latent Gaussian node trivially provides expectation and variance: and . The expected exponential can be shown to be . The outputs of observed nodes are scalar values instead of distributions and thus , and .
The posterior distribution of a latent Gaussian node can be updated as follows. 1) First, the gradients of w.r.t. , and are computed. 2) Second, the terms in which depend on and are assumed to be , where , and . This assumption holds exactly if the output of the node is propagated to Gaussian nodes only and not to discrete nodes. If the output is used by a discrete node with a soft-max prior, this term gives an upper bound of as will be explained later. 3) Third, the minimum of is solved. This can be done analytically if , otherwise the minimum is obtained iteratively.