In this section an experiment with a dynamical model for variances applied to image sequence analysis is reported. The motivation behind modelling variances is that in many natural signals, there exists higher order dependencies which are well characterised by correlated variances of the signals Parra00NIPS. Hence we postulate that we should be able to better catch the dynamics of a video sequence by modelling the variances of the features instead of the features themselves. This indeed is the case as will be shown.
The model considered can be summarised by the following set of equations:
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
The sparsity of is crucial as the computational
complexity of the learning algorithm depends on the number
of connections from
to
. The same goal could
have been reached with a different kind of approach as well. Instead
of constraining the mapping to be sparse from the very beginning
of learning it could have been allowed to be full for a number
of iterations and only after that pruned based on the cost function
as explained in Section 6.2. But as the basis
for image sequences tends to get sparse anyway, it is a waste
of computational resources to wait while most of the weights
in the linear mapping tend to zero.
For comparison purposes, we postulate another model where the dynamical relations are sought directly between the sources leading to the following model equations:
![]() |
![]() |
|
![]() |
![]() |
The data
was a video image sequence Hateren98 of dimensions
. That is, the data consisted of 4000
subsequent digital images of the size
. A part
of the data set is shown in Figure 14.
Both models were learned by iterating the learning algorithm
2000 times at which stage a sufficient convergence was attained.
The first hint of the superiority of the DynVar model was provided
by the difference of the cost between the models which was 28 bits/frame
[for the coding interpretation, see][]Honkela04TNN. To further
evaluate the performance
of the models, we considered a simple prediction task where the next
frame was predicted based on the previous ones.
The predictive distributions,
,
for the models can be approximately computed based on the posterior
approximation.
The means of the predictive distributions are very similar for both
of the models. Figure 15 shows the means of the DynVar
model for the same sequence as in Figure 14.
The means themselves are not very interesting, since they mainly
reflect the situation in the previous frame.
However, the DynVar model provides also a rich
model for the variances. The standard deviations of its predictive
distribution are shown in Figure 16. White stands for a large
variance and black for a small one. Clearly, the model
is able to increase the predicted variance in the area of high
motion activity and hence provide better predictions. We can offer
quantitative support for this claim by computing the predictive perplexities
for the models. Predictive perplexity is widely used in language modelling
and it is defined as
![]() |
[width=0.5]VarPredVar
|
The possible applications for a model of image sequences include video compression, motion detection, early stages of computer vision, and making hypotheses on biological vision.