Feature selection and hierarchical classifier design with applications to human motion recognition [21]

Original Abstract

The performance of a classifier is affected by a number of factors including classifiertype, the input features and the desired output. This thesis examines the impact of featureselection and classification problem division on classification accuracy and complexity.Proper feature selection can reduce classifier size and improve classifier performanceby minimizing the impact of noisy, redundant and correlated features. Noisy features cancause false association between the features and the classifier output. Redundant andcorrelated features increase classifier complexity without adding additional information.Output selection or classification problem division describes the division of a large clas-sification problem into a set of smaller problems. Problem division can improve accuracyby allocating more resources to more difficult class divisions and enabling the use of morespecific feature sets for each sub-problem.The first part of this thesis presents two methods for creating feature-selected hierarchi-cal classifiers. The feature-selected hierarchical classification method jointly optimizes thefeatures and classification tree-design using genetic algorithms. The multi-modal binarytree (MBT) method performs the class division and feature selection sequentially and tol-erates misclassifications in the higher nodes of the tree. This yields a piecewise separationfor classes that cannot be fully separated with a single classifier. Experiments show thatthe accuracy of MBT is comparable to other multi-class extensions, but with lower testtime. Furthermore, the accuracy of MBT is significantly higher on multi-modal data sets.The second part of this thesis focuses on input feature selection measures. A numberof filter-based feature subset evaluation measures are evaluated with the goal of assessingtheir performance with respect to specific classifiers. Although there are many featureselection measures proposed in literature, it is unclear which feature selection measuresare appropriate for use with different classifiers. Sixteen common filter-based measures aretested on 20 real and 20 artificial data sets, which are designed to probe for specific featureselection challenges. The strengths and weaknesses of each measure are discussed withrespect to the specific feature selection challenges in the artificial data sets, correlationwith classifier accuracy and their ability to identify known informative features.The results indicate that the best filter measure is classifier-specific. K-nearest neigh-bours classifiers work well with subset-based RELIEF, correlation feature selection or con-ditional mutual information maximization, whereas Fisher’s interclass separability criterionand conditional mutual information maximization work better for support vector machines.Based on the results of the feature selection experiments, two new filter-based measuresare proposed based on conditional mutual information maximization, which performs well but cannot identify dependent features in a set and does not include a check for corre-lated features. Both new measures explicitly check for dependent features and the secondmeasure also includes a term to discount correlated features. Both measures correctly iden-tify known informative features in the artificial data sets and correlate well with classifieraccuracy.The final part of this thesis examines the use of feature selection for time-series databy using feature selection to determine important individual time windows or key framesin the series. Time-series feature selection is used with the MBT algorithm to createclassification trees for time-series data. The feature selected MBT algorithm is tested ontwo human motion recognition tasks: full-body human motion recognition from joint angledata and hand gesture recognition from electromyography data. Results indicate that thefeature selected MBT is able to achieve high classification accuracy on the time-series datawhile maintaining a short test time.

Next: Learning Multi-modal Latent Attributes Up: Summary of References Related Previous: Deep Learning in Neural Contents

Miquel Perello Nieto 2014-11-28