This paper describes an Arabic document indexing system based on a hybrid "Latent Semantic Analysis" (LSA) and "Self-Organizing Maps" (SOM) algorithm. The approach has the advantage to be completely statistic and to automatically infer the indices from the documents database. A rule-based stemming method is also proposed for the Arabic language. The whole system has been experimented on a database formed of the "Alnahar" newspaper articles for 1999. Documents clustering and few first experiments in retrieval have provided encouraging results.
Poster presented in ICASSP (88 kB)
A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vectors to nearby units, which is here exploited in experiments to improve the recognition speed of the system.
LVQ provides simple but efficient stochastic learning algorithms to improve the classification accuracy in pattern recognition problems. Here, LVQ is applied to develop an iterative training method for mixture density the HMMs, which increases both the modeling accuracy of the states and the discrimination between the models of different phonemes. Experiments are also made with LVQ based corrective tuning methods for the mixture density HMMs, which aim at improving the models by learning from the observed recognition errors in the training samples.
The suggested HMM training methods are tested using the Finnish speech database collected in the Neural Networks Research Centre at Helsinki University of Technology. Statistically significant improvements compared to the best conventional HMM training methods are obtained using the speaker dependent but vocabulary independent phoneme models. The decrease in the average number of phoneme recognition errors for the tested speakers have been around 10 percent in the applied test material.
Entire thesis (189 kB) The www page
Entire paper (60 kB) Slides (124 kB)
Self-Organizing Maps (SOM) and Learning Vector Quantization (LVQ) are applied into the initialization of the mean vectors of the mixture Gaussian densities for CDHMMs and SCHMMs to reduce the amount of required ML estimation and achieve more discriminative phoneme models. Experiments are also made with a LVQ-based corrective tuning method by which the HMMs can be further enhanced for lower recognition error rates.
Some approximations for the computationally complex determination of the continuous output densities for the CDHMMs and SCHMMs are also tested. The suggested approximations reduce the number of covariance parameters in the multivariate mixture Gaussian densities and the number of mixtures actually used in the observation probability computations. The lowest average phoneme recognition error rates achieved by the novel combinations of training methods were about 5.6 %.
Foneemit mallitetaan peräkkäisinä puhesignaalin tiloina. Tilat määritetään puhesignaalista laskettujen hetkellisten kepstrikertoimien muodostamien piirrevektorien ja tilojen välisten siirtymien todennäköisyysjakaumien avulla. Piirrevektorijakaumaa approksimoidaan monen osittain päällekkäisen normaalijakauman yhdistelmällä. Mallit opetetaan kerätyn puheaineiston perusteella käyttäen vektorikvantisointimenetelmiä normaalijakaumien huippujen optimaaliseen sijoitteluun ja iteratiivista Baum-Welch -estimointia muiden parametrien määrittämiseen.
Tehdyt kokeet osoittavat, että piirrevektorien todennäköisyysjakaumat puhesignaalin tilamallissa ovat mutkikkaita ja niiden approksimaatiot vaativat paljon parametreja. Parhaaksi strategiaksi osoittautui keskittyminen enemmän jakaumien huippujen paikallistamiseen kuin huippujen muodon kuvaamiseen. Itseorganisoituvan piirrekartan ja oppivan vektorikvantisoinnin käyttö Baum-Welch -estimoinnin ohella jatkuvien Markov-mallien opetuksessa tuotti parempia tunnistustuloksia kuin pelkkä Baum-Welch -opetus. Tunnistuksessa saadut tulokset olivat tarkempia kuin puheentunnistusjärjestelmässä tällä hetkellä käytössä olevilla diskreetin havaintojakauman Markov-malleilla, mutta eivät kuitenkaan täysin vertailukelpoisia.
The core of the basic recognition system is Learning Vector Quantization (LVQ1) [1]. This algorithm was originally used to classify FFT based short time feature vectors into phonemic classes. The phonemeic decoding phase was earlier based on simple durational rules [2] [3].
At the feature level, we now study the effect of using mel scale cepstral features and concatenating several consecutive feature vectors to include context. At the output of vector quantization, acomparison of three approaches to take into account the classifications of feature vectors in local context is presented. The rule based phonemic decoding is compared to decoding employing Hidden Markov Models (HMMs). As earlier, an optional grammatical post correction method (DEC) is applied.
Experiments conducted with three male speakers indicate that it is possible to increase significantly the phonemic transcription accuracy of the previous configuration. By using appropriately liftered cepstra, concatenating three adjacent feature vectors, and using HMM based phonemic decoding, the error rate can be decreased from 14.0 % to 5.8 %.