next up previous contents
Next: Adaptive deconvolutional networks for Up: Summary of References Related Previous: Learning hierarchical invariant spatio-temporal   Contents

Subsections

Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis [54]

Original Abstract

Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose using unsupervised feature learning as a way to learn features directly from video data. More specifically, we present an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data. We discovered that, despite its simplicity, this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. By replacing hand-designed features with our learned features, we achieve classification results superior to all previous published results on the Hollywood2, UCF, KTH and YouTube action recognition datasets. On the challenging Hollywood2 and YouTube action datasets we obtain 53.3


next up previous contents
Next: Adaptive deconvolutional networks for Up: Summary of References Related Previous: Learning hierarchical invariant spatio-temporal   Contents
Miquel Perello Nieto 2014-11-28