Next: The Stanford / Technicolor Up: Summary of References Related Previous: Learning hierarchical features for Contents

Subsections

ImageNet Classification with Deep Convolutional Neural Networks [50]

Original Abstract

We trained a large, deep convolutional neural network to classify the 1.2 millionhigh-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5

Main points

CNN architecture:
- 650.000 neurons (60 million parameters)
- 5 convolutional layers
- Some of them followed by a max-pooling layer
- 3 fully-connected layers
- 1 1000-way softmax
Dropout regularization method to reduce overfitting in 3 fully-connected layers
Training time: 5-6 days on two GTX 580 3GB GPUs
Dataset:
- ILSVRC-2010
- Down-sampled images to a fixed resolution of 256x256
- Substract the mean activity ofver training set from each pixel
ReLU:
- Faster than tanh
- ReLU: 6 epochs
- tanh: 36 more epochs to achieve same performance
Local Response Normalization
- and error reduction
- Helps generalization
- , and
Overlapping Pooling
- and error reduction
- grid
- stride = 2
- Overlap each pooling one column pixel
Overall Architecture
- 224x224x3 (RGB image)
- Conv 96 kernels of size 11x11x3 with stride of 4 pixels
- Response-Normalized and max-pooling
- Conv 256 kernels of size 5x5x48 with stride of ? pixels
- Response-Normalized and max-pooling
- Conv 384 kernels of size 3x3x256
- Conv 384 kernels of size 3x3x192
- Conv 256 kernels of size 3x3x192
- ¿Response-Normalized? and Max-pooling
- Fully connected 4096
- Fully connected 4096
- Fully connected 1000
- Softmax
Figure 1: Architecture of the CNN
Data augmentation
- error reduction
- Original images escaled scaled and croped to 256x256
- Extract 5 images of 224x224 from corners plus center
- Mirror horizontally and get 5 more images
- Augment data altering RGB channels:
  - Perform PCA on RGB throughout the training set
  - Each training image add multiples of PCs with gaussian noise
Dropout
- Put to zero the output of neurons with probability 0.5
- At test time multiply the outputs by 0.5
- Two first fully-connected layers
- Solves overfitting
- Dobules the number of iterations required to ocnverge
Details of learning
- batch size = 128
- momentum 0.9
- weight decay 0.0005
- Initial weights from zero-mean Gaussian std=0.01
- biases = 1 on second, fourth, fifth Conv and fully-connected
- biases = 0 on the rest
Evaluation
- Consider the feature activations induced by an image at the last, 4096-dimensional hidden layer

Next: The Stanford / Technicolor Up: Summary of References Related Previous: Learning hierarchical features for Contents

Miquel Perello Nieto 2014-11-28