We introduce AdaFrame, a conditional computation framework that adaptively selects relevant frames on a per-input basis for fast video recognition. AdaFrame, which contains a Long Short-Term Memory augmented with a global memory to provide context information, operates as an agent to interact with video sequences aiming to search over time which frames to use. Trained with policy search methods, at each time step, AdaFrame computes a prediction, decides where to observe next, and estimates a utility, i.e., expected future rewards, of viewing more frames in the future. Exploring predicted utilities at testing time, AdaFrame is able to achieve adaptive lookahead inference so as to minimize the overall computational cost without incurring a degradation in accuracy. We conduct extensive experiments on two large-scale video benchmarks, FCVID and ActivityNet. With a vanilla ResNet-101 model, AdaFrame achieves similar performance of using all frames while only requiring, on average, 8.21 and 8.65 frames on FCVID and ActivityNet, respectively. We also demonstrate AdaFrame is compatible with modern 2D and 3D networks for video recognition. Furthermore, we show, among other things, learned frame usage can reflect the difficulty of making prediction decisions both at instance-level within the same class and at class-level among different categories.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2020.3029425 | DOI Listing |
BMC Res Notes
December 2024
Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran.
This dataset contains demographic, morphological and pathological data, endoscopic images and videos of 191 patients with colorectal polyps. Morphological data is included based on the latest international gastroenterology classification references such as Paris, Pit and JNET classification. Pathological data includes the diagnosis of the polyps including Tubular, Villous, Tubulovillous, Hyperplastic, Serrated, Inflammatory and Adenocarcinoma with Dysplasia Grade & Differentiation.
View Article and Find Full Text PDFSci Rep
December 2024
Henan University of Engineering, Zhengzhou, 451191, China.
Social media generates vast amounts of spatio-temporal sequential data. However, current methods often ignore the complex spatio-temporal correlations within these data. This oversight makes it difficult to fully capture the dynamic features of the data.
View Article and Find Full Text PDFJ Imaging
November 2024
Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China.
In recent years, advancements in computer vision have yielded new prospects for intelligent transportation applications, specifically in the realm of automated traffic flow data collection. Within this emerging trend, the ability to swiftly and accurately detect vehicles and extract traffic flow parameters from videos captured during snowfall conditions has become imperative for numerous future applications. This paper proposes a new analytical framework designed to extract traffic flow parameters from traffic flow videos recorded under snowfall conditions.
View Article and Find Full Text PDFHernia
December 2024
Department of Surgery, Tsudanuma Central General Hospital, 1- 9-17 Yatsu, Narashino, Japan.
Purpose: In laparoscopic inguinal hernia surgery, proper recognition of loose connective tissue, nerves, vas deferens, and microvessels is important to prevent postoperative complications, such as recurrence, pain, sexual dysfunction, and bleeding. EUREKA (Anaut Inc., Tokyo, Japan) is a system that uses artificial intelligence (AI) for anatomical recognition.
View Article and Find Full Text PDFObjective: To identify lifting actions and count the number of lifts performed in videos based on robust class prediction and a streamlined process for reliable real-time monitoring of lifting tasks.
Background: Traditional methods for recognizing lifting actions often rely on deep learning classifiers applied to human motion data collected from wearable sensors. Despite their high performance, these methods can be difficult to implement on systems with limited hardware resources.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!