A popular way to regularize ill-posed problems is penalizing the use of large parameter values by adding a proper penalty term into the cost function; see for example [3]. In our case, one can modify the cost function in Eq. (2) as follows:
(17) |
A more general penalization would use different regularization parameters for different parts of and . For example, one can use a parameter of its own for each of the column vectors of and the row vectors of . Note that since the columns of can be scaled arbitrarily by rescaling the rows of accordingly, one can fix the regularization term for , for instance, to unity.
An equivalent optimization problem can be obtained using a probabilistic formulation with (independent) Gaussian priors and a Gaussian noise model:
BR | (20) |
Note that in case of joint optimization of BR w.r.t. , , , and , the cost function (20) has a trivial minimum with , . We try to avoid this minimum by using an orthogonalized solution provided by unregularized PCA from the learning rules (14) and (15) for initialization. Note also that setting to small values for some components is equivalent to removal of irrelevant components from the model. This allows for automatic determination of the proper dimensionality instead of discrete model comparison (see, e.g., [13]). This justifies using separate in the model in (19).