Next: Evaluation of local spatio-temporal Up: Summary of References Related Previous: Learning Deep Architectures for Contents

Subsections

Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning [71]

Original Abstract

In this paper we present a method for learning class-specific features for recognition. Recently a greedy layer-wise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate restricted Boltzmann machine (RBM). We develop the convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state of the art on handwritten digit recognition and pedestrian detection.

Main points

New Convolutional Restricted Boltzmann Machine (C-RBM)
Comparable state-of-the-art on handwritten digit recognition and pedestrian detection
RBM
- Probabilistic model
- hidden variables independent given observerd data
- Not capture explicitly spacial structure of images
C-RBM
- Include spatial locality and weight sharing
- Favors filters with high response on training images
- Unsupervised learning using Contrastive Divergence
- Layerwise training for stacks of RBMs
- Convolutional connections are employed in a generative Markov Random Field architecture
- Hidden units divided into K feature maps
- Convolution problems
  - Boundary units are withinb a smaller number of subwindows compared to the interior pixels
  - middle pixels may contribute to features
  - Separation of boundary variables () from middle variables ()
  - Problems sampling from boundary pixels (not have nough features)
  - Over completeness because of K-features
  - Sampling creates images very similar to the original ones
  - Need of more Gibbs sampling steps
  - Their solution is to fix hidden bias terms during training
Multilayer C-RBMs
- Subsampling takes maximum conditional feature probability over non-overlapping subwindows of feature maps
- Architecture
  - discriminative layer (SVM)
  - max pooling
  - convolution
  - max pooling
  - convolution
  - input
- On pedestrians also HOG is used in discriminative layer
MNIST dataset
- Discriminative layer with RBF kernel
- 10 one-vs-rest binary SVMs
- 1st layer 15 feature maps
- 2nd layer 2x2 non-overlapping subwindos
- 3rd layer 15 feature maps
- 4th layer
Comparison with Large CNN
- C-RBM is better when training is small
Pedestrian dataset
- 1st layer 7x7 15 feature maps
- 2nd layer 4x4 subsampling
- 3rd layer 15x5x5 30 feature maps
- 4th layer 2x2 subsampling
- + HOG
- Discriminative layer with linear kernel

Next: Evaluation of local spatio-temporal Up: Summary of References Related Previous: Learning Deep Architectures for Contents

Miquel Perello Nieto 2014-11-28