The approximating posterior distribution needed in ensemble learning
is over all the possible hidden state sequences
and the
parameter values
. The approximation is chosen to be of a
factorial form
The approximation
is a discrete distribution and it factorises
as
The distribution
is also formed as a product of
independent distribution for different parameters. The parameters
with Dirichlet priors have posterior approximations of a single
Dirichlet distribution like for
These will actually be the optimal choices among all possible
distributions, assuming the factorisation
.
The parameters with Gaussian priors have Gaussian posterior approximations of the form