A Gaussian variable has two inputs
and
and prior
probability
. The variance is
parametrised this way because then the mean and expected exponential
of
suffice for computing the cost function. It can be shown that
when
,
and
are mutually independent, i.e.
,
yields
![]() |
(2) |
The output of a latent Gaussian node trivially provides expectation
and variance:
and
. The
expected exponential can be shown to be
. The outputs of observed nodes are scalar
values instead of distributions and thus
,
and
.
The posterior distribution of a latent Gaussian node can be
updated as follows. 1) First, the gradients of
w.r.t.
,
and
are computed. 2)
Second, the terms in
which depend on
and
are
assumed to be
, where
,
and
.
This assumption holds exactly if the output of the node is propagated
to Gaussian nodes only and not to discrete nodes. If the output is
used by a discrete node with a soft-max prior, this term gives an
upper bound of
as will be explained later. 3) Third, the
minimum of
is solved. This can be done analytically
if
, otherwise the minimum is obtained iteratively.