next up previous contents
Next: Bits-back argument Up: Ensemble learning Previous: Ensemble learning

Cost function

Publication IV discusses ensemble learning at length, but this section describes briefly the cost function used in ensemble learning. Let us denote the vector of all the unknown variables of the model by $\boldsymbol{\theta}$ and the vector of observations by x and suppose that the probabilities $p(\mathbf{x} \vert \boldsymbol{\theta})$ and $p(\boldsymbol{\theta})$ are defined. According to Bayes' rule, the posterior probability $p(\theta \vert x)$ of the unknown variables is

\begin{displaymath}p(\boldsymbol{\theta} \vert \mathbf{x}) = \frac{p(\mathbf{x} ...
...t \boldsymbol{\theta})
p(\boldsymbol{\theta})}{p(\mathbf{x})}
\end{displaymath} (15)

and the Kullback-Leibler information between the true posterior $p(\boldsymbol{\theta} \vert \mathbf{x})$ and its approximation $q(\boldsymbol{\theta}
\vert \mathbf{x})$ is thus
 \begin{multline}I_{KL}(q(\boldsymbol{\theta} \vert \mathbf{x}) \vert\vert p(\bol...
...ldsymbol{\theta})}
d\boldsymbol{\theta} + \ln p(\mathbf{x}) \, .
\end{multline}
The normalising constant $\ln p(\mathbf{x})$ is usually difficult to compute because it requires marginalising the joint density $p(\mathbf{x}, \boldsymbol{\theta})$ over $\boldsymbol{\theta}$. The cost function which is actually used is

 \begin{displaymath}C(\mathbf{x}; q) = I_{KL}(q(\boldsymbol{\theta} \vert \mathbf...
...ol{\theta}) p(\boldsymbol{\theta})}
d\boldsymbol{\theta} \, .
\end{displaymath} (16)

The approximation $q(\boldsymbol{\theta}
\vert \mathbf{x})$ which minimises (17) also minimises (16) because the term $\ln p(\mathbf{x})$ is constant with respect to the approximation $q(\boldsymbol{\theta}
\vert \mathbf{x})$.

In order for ensemble learning to be computationally efficient, the approximation $q(\boldsymbol{\theta}
\vert \mathbf{x})$ should have a simple factorial form. Then the cost function splits into a sum of simple terms which can be computed efficiently.


next up previous contents
Next: Bits-back argument Up: Ensemble learning Previous: Ensemble learning
Harri Valpola
2000-10-31