Example structures

The building blocks can be connected together rather freely but there are the following restrictions: 1) the resulting network has to be a directed acyclic graph; 2) nonlinearity is always immediately after a Gaussian latent variable; 3) outputs of multiplication or nonlinearity cannot propagate to soft-max or variance prior because the expected exponential cannot be evaluated; 4) the output of a discrete latent variable can only be used for a switch and 5) there should be only one computational path from a latent variable to variable. If there are multiple paths, ensemble learning becomes more complicated [5] and the situation is out of the scope of this paper.

Harri Valpola 2001-10-01