next up previous contents
Next: Deep Neural Networks for Up: Summary of References Related Previous: Gated boltzmann machine in   Contents

Subsections

Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition [1]

Original Abstract

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and max-pooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. In our method, a pair of local filtering layer and max-pooling layer is added at the lowest end of neural network (NN) to normalize spectral variations of speech signals. In our experiments, the proposed CNN architecture is evaluated in a speaker independent speech recognition task using the standard TIMIT data sets. Experimental results show that the proposed CNN method can achieve over 10

Main points

cited: 45 (01/06/2014)


next up previous contents
Next: Deep Neural Networks for Up: Summary of References Related Previous: Gated boltzmann machine in   Contents
Miquel Perello Nieto 2014-11-28