Next: Evaluation of local spatio-temporal
Up: Summary of References Related
Previous: Learning Deep Architectures for
Contents
Subsections
In this paper we present a method for learning class-specific features for recognition. Recently a greedy layer-wise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate restricted Boltzmann machine (RBM). We develop the convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state of the art on handwritten digit recognition and pedestrian detection.
- New Convolutional Restricted Boltzmann Machine (C-RBM)
- Comparable state-of-the-art on handwritten digit recognition and pedestrian detection
- RBM
- Probabilistic model
- hidden variables independent given observerd data
- Not capture explicitly spacial structure of images
- C-RBM
- Include spatial locality and weight sharing
- Favors filters with high response on training images
- Unsupervised learning using Contrastive Divergence
- Layerwise training for stacks of RBMs
- Convolutional connections are employed in a generative Markov Random Field architecture
- Hidden units divided into K feature maps
- Convolution problems
- Boundary units are withinb a smaller number of subwindows compared to the interior pixels
- middle pixels may contribute to features
- Separation of boundary variables () from middle variables ()
- Problems sampling from boundary pixels (not have nough features)
- Over completeness because of K-features
- Sampling creates images very similar to the original ones
- Need of more Gibbs sampling steps
- Their solution is to fix hidden bias terms during training
- Multilayer C-RBMs
- Subsampling takes maximum conditional feature probability over non-overlapping subwindows of feature maps
- Architecture
- discriminative layer (SVM)
- max pooling
- convolution
- max pooling
- convolution
- input
- On pedestrians also HOG is used in discriminative layer
- MNIST dataset
- Discriminative layer with RBF kernel
- 10 one-vs-rest binary SVMs
- 1st layer 15 feature maps
- 2nd layer 2x2 non-overlapping subwindos
- 3rd layer 15 feature maps
- 4th layer
- Comparison with Large CNN
- C-RBM is better when training is small
- Pedestrian dataset
- 1st layer 7x7 15 feature maps
- 2nd layer 4x4 subsampling
- 3rd layer 15x5x5 30 feature maps
- 4th layer 2x2 subsampling
- + HOG
- Discriminative layer with linear kernel
Next: Evaluation of local spatio-temporal
Up: Summary of References Related
Previous: Learning Deep Architectures for
Contents
Miquel Perello Nieto
2014-11-28