Expectation of a function

Next: Variance of a function Up: Computing the expectation of Previous: Expectation of the description

Expectation of a function

Equations 6 and 7 show that we could compute the expected description length if we knew how to evaluate the expectations of the functions f_i.

Let $f_i(\xi_j \vert j \in \mathcal{J}_i)$ be a function whose expectation value we would like to compute. As before, we shall approximate f_i with its second order Taylor's series expansion. However, we are not going to expand f_i with respect to the discretised parameters $\boldsymbol{\hat{\theta}}$ of the network but the direct parameters $\xi_j$ of the function f_i. The expansion is thus computed about the expectation values $\mu_j = E\{\xi_j\}$ .
$\begin{multline} \xi_i = f_i(\xi_j \vert j \in \mathcal{J}_i) \approx f_i(\mu_j... ...i}{\partial \xi_j \partial \xi_k} (\xi_j - \mu_j) (\xi_k - \mu_k)\end{multline}$
Taking expectation from both sides of equation 8 yields
$\begin{multline} \mu_i = E\{f_i(\xi_j \vert j \in \mathcal{J}_i)\} \approx \\ ... ...al{J}_i} \frac{v_j}{2} \frac{\partial^2 f_i}{\partial {\xi_j}^2}.\end{multline}$
Here we have denoted the variance of $\xi_j$ by v_j: $v_j \stackrel{\mathit{def}}{=} E\{(\xi_j - \mu_j)^2\}$ . The first order terms disappear because $E\{\xi_j - \mu_j\} = 0$ . We have also assumed that for all $j \neq k$ , either $\xi_j$ and $\xi_k$ are uncorrelated or $\frac{\partial^2 f_i}{\partial \xi_j \partial \xi_k} = 0$ , which removes the second order cross terms.

Harri Lappalainen
5/19/1998