Many models include groups of parameters that are somehow related or
connected. This connection should be reflected in the prior chosen
for them. *Hierarchical models* provide a useful tool for
building priors for such groups. This is done by giving the
parameters a common prior distribution which is parameterised with new
higher level hyperparameters [16].

Such a group would typically include parameters that have a somehow similar status in the model. Hierarchical models are well suited for neural network related problems because such connected groups emerge naturally, like for example the different elements of a weight matrix.

The definitions of the components of the Bayesian nonlinear switching state-space model in Chapter 5 contain several examples of hierarchical priors.

Antti Honkela 2001-05-30