Next: Towards Real-Time Image Understanding Up: Summary of References Related Previous: On the saddle point Contents

Subsections

Learning Deep Face Representation [18]

Original Abstract

Face representation is a crucial step of face recognition systems. An optimal face representation should be discriminative, robust, compact, and very easy-to-implement. While numerous hand-crafted and learning-based representations have been proposed, considerable room for improvement is still present. In this paper, we present a very easy-to-implement deep learning framework for face representation. Our method bases on a new structure of deep network (called Pyramid CNN). The proposed Pyramid CNN adopts a greedy-filter-and-down-sample operation, which enables the training procedure to be very fast and computation-efficient. In addition, the structure of Pyramid CNN can naturally incorporate feature sharing across multi-scale face representations, increasing the discriminative ability of resulting representation. Our basic network is capable of achieving high recognition accuracy (85.8

Main points

New deep structure Pyramid CNN
Labeled Faces in the Wild (LFW)
- faces
- 1680 of the people have two or more distinct photos
- Detected by Viola-Jones detector
- http://vis-www.cs.umass.edu/lfw/
State-of-the-art performance on LFW benchmark ()
Good face representation
- Identity-preserving: Same person pictures close in feature space
- Abstract and Compact: from high to low dimensionality
- Uniform and Automatic: NO hand-crafted and hard-wired parts
Pyramid CNN
- ID-preserving Representation Learning: Loss functions measures distance in output feature space
- Convolutions and Down-sampling
- Deeper give best results, but increases rapidly the training time
- Each CNN own private output layer and gets the input from the previous shared layer
- Only the output of the last level network is used for the represetnation
- The rest of the outputs is just for training
Results
- 164 incorrect predictions
- Some of them are incorrectly labeled
- Others are very difficult for humans, because of the age or pose
- On LFW benchmark achieves state-of-the-art and close to human on croped images
With ROC curve as a mesure there is an improvement of 0.07-0.12 with Baseline
Face recognition does not contemplate affine transformations or perspectives,
Can be difficult to apply in task such as ImageNet, where the object can be in any place and position

Next: Towards Real-Time Image Understanding Up: Summary of References Related Previous: On the saddle point Contents

Miquel Perello Nieto 2014-11-28