In a hierarchical model, the data X depends only on some of the parameters. They are called the parameters of the first layer . Parameters of the first layer depend only on the second layer parameters and so on.
The term
in equation () can be split
into a product of simpler terms since dependencies over layers
can be truncated
= | (3.12) | ||
= |
This means that also the expectation in () and thus the whole cost function C in () become sums of simple terms.