Next: Recognizing human actions: a Up: Summary of References Related Previous: Why color management? [48] Contents

Subsections

Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis [86]

Original Abstract

Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple “do-it-yourself” implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.

Main points

Get a training set as large as possible
No need of complex methods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop, or even finely-tuning the architecture
Increment dataset by:
- Affine transformations: translations, scaling, homothety, similarity transformation, reflection, rotation, shear mapping, and compositions.
- Elastic distortions
In this paper the authors justify the use of elastic deformations on MNIST data corresponding to uncontrolled oscillations of the hand muscles, dampened by inertia.
They get the best results on MNIST to date with CNN, affine and elastic transformations of the dataset (0.4% error).

Next: Recognizing human actions: a Up: Summary of References Related Previous: Why color management? [48] Contents

Miquel Perello Nieto 2014-11-28