We address the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (e.g., onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF - it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations, and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2017.2741482DOI Listing

Publication Analysis

Top Keywords

video classification
8
facial analysis
8
human action
8
discriminatively trained
4
trained latent
4
latent ordinal
4
ordinal model
4
video
4
model video
4
classification address
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!