next up previous contents
Next: Connection to coding Up: Ensemble Learning Previous: Factorial Approximation

Hierarchical Models

In a hierarchical model, the data X depends only on some of the parameters. They are called the parameters of the first layer $\boldsymbol{\theta}_1$. Parameters of the first layer depend only on the second layer parameters $\boldsymbol{\theta}_2$ and so on.

The term $p(\boldsymbol{\theta},\boldsymbol{X}\mid \mathcal{H})$ in equation ([*]) can be split into a product of simpler terms since dependencies over layers $p(\boldsymbol{\theta}_{i}\vert\boldsymbol{\theta}_{i+1},\dots,\boldsymbol{\theta}_{n}) =
p(\boldsymbol{\theta}_{i}\vert\boldsymbol{\theta}_{i+1})$ can be truncated

$\displaystyle p(\boldsymbol{\theta},\boldsymbol{X}\mid \mathcal{H})$ = $\displaystyle p(\boldsymbol{X}\vert\boldsymbol{\theta}, \mathcal{H})p(\boldsymbol{\theta}\mid \mathcal{H})$ (3.12)
  = $\displaystyle p(\boldsymbol{X}\vert\boldsymbol{\theta}_{1}, \mathcal{H})p(\bold...
...ldsymbol{\theta}_{n}, \mathcal{H})p(\boldsymbol{\theta}_{n} \mid \mathcal{H}) ,$  

where n is the number of layers.

This means that also the expectation in ([*]) and thus the whole cost function C in ([*]) become sums of simple terms.



Tapani Raiko
2001-12-10