The multinomial distribution is a discrete distribution which gives the probability of choosing a given collection of items from a set of items with repetitions and the probabilities of each choice given by . These probabilities are the parameters of the multinomial distribution .
The Dirichlet distribution is the conjugate prior of the parameters of the multinomial distribution. The probability density of the Dirichlet distribution for variables with parameters is defined by
Let . The mean and variance of the distribution are 
When , the distribution becomes noninformative. The means of all the stay the same if all are scaled with the same multiplicative constant. The variances will, however, get smaller as the parameters grow. The pdfs of the Dirichlet distribution with certain parameter values are shown in Figure A.2.
In addition to the standard statistics given above, using ensemble learning for parameters with Dirichlet distribution requires the evaluation of the expectation and the negative differential entropy .
The first expectation can be reduced to evaluating the expectation over a two dimensional Dirichlet distribution for
By using this result, the negative differential entropy can be evaluated