In a hierarchical model, the data
X depends only on some of the
parameters. They are called the parameters of the first layer
.
Parameters of the first layer depend only on the second
layer parameters
and so on.
The term
in equation (
) can be split
into a product of simpler terms since dependencies over layers
can be truncated
| = | (3.12) | ||
| = |
This means that also the expectation in (
) and thus the
whole cost function C in (
) become sums of simple
terms.