The canonical approach to video action recognition dictates a neural network model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferability on new datasets with unseen concepts. In this article, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters' requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, adapt and fine-tune." This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then, it makes the action recognition task to act more like pre-training problems via adaptation engineering. Finally, it is fine-tuned end-to-end on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone. Code is available at https://github.com/sallymmx/ActionCLIP.git.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3331841DOI Listing

Publication Analysis

Top Keywords

action recognition
28
video action
8
label texts
8
multimodal learning
8
learning framework
8
recognition task
8
action
7
recognition
7
actionclip adapting
4
adapting language-image
4

Similar Publications

Carbohydrate-functionalized quantum dots exhibit excellent physical characteristics and enhance the steric interaction with biological cells and tissues. Glycoconjugation of quantum dots promotes aqueous solubility, stability, and reduced immunogenicity. Carbohydrate-protein interactions are involved in various vital processes and provide insight into cellular recognition, cell-to-cell communication, pathogenicity, antigen-antibody recognition, and enzymatic action.

View Article and Find Full Text PDF

Introduction: People who use drugs (PWUD) are at risk of HIV infection, but the frequency and distribution of transmission-associated behaviors within rural communities is not well understood. Further, while interventions designed to more explicitly affirm individuals' sexual orientation and behaviors may be more effective, descriptions of behavior variability by orientation are lacking. We sought to describe how disease transmission behaviors and overdose risk vary by sexual orientation and activity among rural PWUD.

View Article and Find Full Text PDF

Background/purpose: launched a call to action for dermatologists in the rise of syphilis. In practice, dermatologists and stomatologists perform early diagnoses of syphilis and refer patients to adequate treatment.

Materials And Methods: This scientometric study aimed to investigate and compare research trends and characteristics of syphilis publications by dermatologists and stomatologists in the Scopus database, with emphasis on the analysis of the keywords that can reflect research directions and topics of concern.

View Article and Find Full Text PDF

Tau oligomers impair memory and synaptic plasticity through the cellular prion protein.

Acta Neuropathol Commun

January 2025

Department of Neuroscience, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy.

Deposition of abnormally phosphorylated tau aggregates is a central event leading to neuronal dysfunction and death in Alzheimer's disease (AD) and other tauopathies. Among tau aggregates, oligomers (TauOs) are considered the most toxic. AD brains show significant increase in TauOs compared to healthy controls, their concentration correlating with the severity of cognitive deficits and disease progression.

View Article and Find Full Text PDF

Analyzing the habits of exercisers is crucial for developing targeted interventions that can effectively promote long-term physical activity behavior. While much of existing literature has focused on individual-level factors, there is a growing recognition of the importance of examining how broader determinants impact physical activity. In this study, we analyze large-scale human mobility data from over 20 million individuals to investigate how visits to various locations, such as cafes and restaurants, influence visits to fitness centers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!