Hierarchical Models

Next: Connection to coding Up: Ensemble Learning Previous: Factorial Approximation

Hierarchical Models

In a hierarchical model, the data X depends only on some of the parameters. They are called the parameters of the first layer $\boldsymbol{\theta}_1$ . Parameters of the first layer depend only on the second layer parameters $\boldsymbol{\theta}_2$ and so on.

The term $p(\boldsymbol{\theta},\boldsymbol{X}\mid \mathcal{H})$ in equation () can be split into a product of simpler terms since dependencies over layers $p(\boldsymbol{\theta}_{i}\vert\boldsymbol{\theta}_{i+1},\dots,\boldsymbol{\theta}_{n}) = p(\boldsymbol{\theta}_{i}\vert\boldsymbol{\theta}_{i+1})$ can be truncated

$\displaystyle p(\boldsymbol{\theta},\boldsymbol{X}\mid \mathcal{H})$	=	$\displaystyle p(\boldsymbol{X}\vert\boldsymbol{\theta}, \mathcal{H})p(\boldsymbol{\theta}\mid \mathcal{H})$	(3.12)
	=	$\displaystyle p(\boldsymbol{X}\vert\boldsymbol{\theta}_{1}, \mathcal{H})p(\bold... ...ldsymbol{\theta}_{n}, \mathcal{H})p(\boldsymbol{\theta}_{n} \mid \mathcal{H}) ,$

where n is the number of layers.

This means that also the expectation in () and thus the whole cost function C in () become sums of simple terms.

Tapani Raiko
2001-12-10