This paper presents a principled framework for analyzing collective activities at different levels of semantic granularity from videos. Our framework is capable of jointly tracking multiple individuals, recognizing activities performed by individuals in isolation (i.e., atomic activities such as walking or standing), recognizing the interactions between pairs of individuals (i.e., interaction activities) as well as understanding the activities of group of individuals (i.e., collective activities). A key property of our work is that it can coherently combine bottom-up information stemming from detections or fragments of tracks (or tracklets) with top-down evidence. Top-down evidence is provided by a newly proposed descriptor that captures the coherent behavior of groups of individuals in a spatial-temporal neighborhood of the sequence. Top-down evidence provides contextual information for establishing accurate associations between detections or tracklets across frames and, thus, for obtaining more robust tracking results. Bottom-up evidence percolates upwards so as to automatically infer collective activity labels. Experimental results on two challenging data sets demonstrate our theoretical claims and indicate that our model achieves enhances tracking results and the best collective classification results to date.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2013.220 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!