Next: Experiments Up: No Title Previous: Using Additional Information for

Nonlinear Independent Factor Analysis

The nonlinear factor analysis model introduced in the previous section has Gaussian distributions for the sources. In this section we are going to show how that model can easily be extended to have mixture-of-Gaussians models for sources. In doing so we are largely following the method introduced in [1] for Bayesian linear independent factor analysis. The resulting model is a nonlinear counterpart of ICA or, more accurately, a nonlinear counterpart of independent factor analysis because the model includes finite noise. The difference between the models is similar to that between linear PCA and ICA because the first layer weight matrix A in the network has the same indeterminacies in nonlinear PCA as in linear PCA. The indeterminacy is discussed in the introductory chapter.

According to the model for the distribution of the sources, there are several Gaussian distributions and at each time instant, the source originates from one of them. Let us denote the index of the Gaussian from which the source s_i(t) originates by M_i(t). The model for the distribution for the ith source at time t is

$\begin{displaymath}p(s_i(t) \vert \theta) = \sum_{M_i(t)} P(M_i(t) \vert \theta) p(s_{i}(t) \vert \theta, M_i(t)) \, , \end{displaymath}$

(33)

where $p(s_{i}(t) \vert \theta, M_i(t) = j)$ is a time-independent Gaussian distribution with its own mean m_ij and log-std v_ij. The probabilities $P(M_i(t) \vert \theta)$ of different Gaussians are modelled with time-independent soft-max distributions.

$\begin{displaymath}P(M_i(t) = j \vert \theta) = \frac{e^{c_{ij}}}{\sum_{j'} e^{c_{ij'}}} \end{displaymath}$

(34)

Each combination of different Gaussians producing the sources can be considered a different model. The number of these models is enormous, of course, but their posterior distribution can still be approximated by a similar factorial approximation which is used for other variables.

$\begin{displaymath}Q(\mathrm{M} \vert \mathrm{X}) = \prod_{M_i(t)} Q(M_i(t) \vert \mathrm{X}) \end{displaymath}$

(35)

Without losing any further generality, we can now write

$\begin{displaymath}q(s_i(t), M_i(t) \vert \theta) = Q(M_i(t) \vert \theta) q(s_i(t) \vert \theta, M_i(t)) \, , \end{displaymath}$

(36)

which yields

$\begin{displaymath}q(s_i(t) \vert \theta) = \sum_j q(M_i(t) = j \vert \theta) Q(s_i(t) \vert \theta, M_i(t) = j) \, . \end{displaymath}$

(37)

This means that the approximating ensemble for the sources has a form similar to the prior, i.e., an independent mixture of Gaussians, although the posterior mixture is different at different times.

Due to the assumption of factorial posterior distribution of the models, the cost function can be computed as easily as before. Let us denote $Q(M_i(t) = j \vert \theta) = \dot{s}_{ij}(t)$ and the posterior mean and variance of $q(s_i(t) \vert \theta, M_i(t) = j)$ by $\bar{s}_{ij}(t)$ and $\tilde{s}_{ij}(t)$ . It easy to see that the posterior mean and variance of s_i(t) are

$\displaystyle \bar{s}_i(t)$	=	$\displaystyle \sum_j \dot{s}_{ij}(t) \bar{s}_{ij}(t)$	(38)
$\displaystyle \tilde{s}_i(t)$	=	$\displaystyle \sum_j \dot{s}_{ij}(t) [\tilde{s}_{ij}(t) + (\bar{s}_{ij}(t) - \bar{s}_i(t))^2] \, .$	(39)

After having computed the posterior mean $\bar{s}_i$ and variance $\tilde{s}_i$ of the sources, the computation of the C_p part of the cost function proceeds as with nonlinear factor analysis in the previous section. The C_q part yields terms of the form
$\begin{multline}q(s_i(t) \vert \mathrm{X}) \ln q(s_i(t) \vert \mathrm{X}) = \\ \... ...(t) = j, \mathrm{X}) \ln q(s_i(t) \vert M_i(t) = j, \mathrm{X})] \end{multline}$
and we have thus reduced the problem to a previously solved one. The terms $q(s_i(t) \vert M_i(t), \mathrm{X}) \ln q(s_i(t) \vert M_i(t), \mathrm{X})$ are the same as for the nonlinear factor analysis and otherwise the equation has the same form as in model selection in Chap. 6. This is not surprising since the terms Q(M_i(t) | X) are the probabilities of different models and we are, in effect, therefore doing factorial model selection.

Most update rules are the same as for nonlinear factor analysis. Equations (39) and (40) bring the terms $\dot{s}_{ij}(t)$ for updating the means m_ij and log-std parameters v_ij of the sources. It turns out that they both will be weighted with $\dot{s}_{ij}$ , i.e., the observation is used for adapting the parameters in proportion to the posterior probability of that observation originating from that particular Gaussian distribution.

Next: Experiments Up: No Title Previous: Using Additional Information for

Harri Lappalainen
2000-03-03