IEEE Trans Pattern Anal Mach Intell
April 2020
The objective of this work is automatic labelling of characters in TV video and movies, given weak supervisory information provided by an aligned transcript. We make five contributions: (i) a new strategy for obtaining stronger supervisory information from aligned transcripts; (ii) an explicit model for classifying background characters, based on their face-tracks; (iii) employing new ConvNet based face features, and (iv) a novel approach for labelling all face tracks jointly using linear programming. Each of these contributions delivers a boost in performance, and we demonstrate this on standard benchmarks using tracks provided by authors of prior work.
View Article and Find Full Text PDF