A popular way to regularize ill-posed problems is penalizing the use of large parameter values by adding a proper penalty term into the cost function; see for example [3]. In our case, one can modify the cost function in Eq. (2) as follows:
![]() |
(17) |
A more general penalization would use different regularization parameters
for different parts of
and
. For example, one can use
a
parameter of its own for each of the column vectors
of
and the row vectors
of
.
Note that since the columns of
can be scaled arbitrarily by
rescaling the rows of
accordingly, one can fix the
regularization term for
, for instance, to unity.
An equivalent optimization problem can be obtained using a probabilistic formulation with (independent) Gaussian priors and a Gaussian noise model:
![]() |
![]() |
(20) |
Note that in case of
joint optimization of
BR w.r.t.
,
,
, and
, the cost function (20) has a
trivial minimum with
,
. We try to avoid this
minimum by using an orthogonalized solution
provided by unregularized PCA from the learning rules (14)
and (15) for initialization.
Note
also that setting
to small values for some components
is
equivalent to removal of irrelevant components from the model. This
allows for automatic determination of the proper dimensionality
instead of discrete model comparison (see, e.g.,
[13]). This justifies using separate
in the
model in (19).