Can we teach a robot to recognize and make predictions for activities that it has never seen before? We tackle this problem by learning models for video from text. This paper presents a hierarchical model that generalizes instructional knowledge from large-scale text corpora and transfers the knowledge to video. Given a portion of an instructional video, our model recognizes and predicts coherent and plausible actions multiple steps into the future, all in rich natural language. To demonstrate the capabilities of our model, we introduce the Tasty Videos Dataset V2, a collection of 4022 recipes for zero-shot learning, recognition and anticipation. Extensive experiments with various evaluation metrics demonstrate the potential of our method for generalization, given limited video data for training models.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3218596DOI Listing

Publication Analysis

Top Keywords

video
5
transferring knowledge
4
knowledge text
4
text video
4
video zero-shot
4
zero-shot anticipation
4
anticipation procedural
4
procedural actions
4
actions teach
4
teach robot
4

Similar Publications

Single-port robot-assisted pyeloplasty through supine anterior retroperitoneal access.

Indian J Urol

January 2025

Norris Comprehensive Cancer Center, Institute of Urology, University of Southern California, Los Angeles, CA, USA.

This video explores the technique of robot-assisted pyeloplasty using the Da-Vinci Single-Port robot through the supine anterior retroperitoneal access in a 28-year-old male with a right-sided ureteropelvic junction obstruction. The patient was placed in a supine position, with a 10°-20° elevation of the ipsilateral flank. Retroperitoneal access was obtained at the McBurney's point for the placement of the port.

View Article and Find Full Text PDF

Background: This video article describes the use of bone-anchored prostheses for patients with transtibial amputations, most often resulting from trauma, infection, or dysvascular disease. Large studies have shown that about half of all patients with a socket-suspended artificial limb experience limited mobility and limited prosthesis use because of socket-related problems. These problems occur at the socket-residual limb interface as a result of a painful and unstable connection, leading to an asymmetrical gait and subsequent pelvic and back pain.

View Article and Find Full Text PDF

Social and behavioral science researchers who use survey data are vigilant about data quality, with an increasing emphasis on avoiding common method variance (CMV) and insufficient effort responding (IER). Each of these errors can inflate and deflate substantive relationships, and there are both a priori and post hoc means to address them. Yet, little research has investigated how both IER and CMV are affected with the use of these different procedural or statistical techniques used to address them.

View Article and Find Full Text PDF

Objective: Anxiety is common among patients attending an initial oncology consultation. The objective of this trial was to test if an enhanced compassion video emailed to patients prior to their initial oncology consultation reduces anxiety compared with being sent an information-only introduction video.

Methods And Analysis: We conducted a randomised control trial at a single university-based cancer centre between May 2021 and October 2023.

View Article and Find Full Text PDF

3D measurement for endoscopic systems has been largely demanded. One promising approach is to utilize active-stereo systems using a micro-sized pattern-projector attached to the head of an endoscope. Furthermore, a multi-frame integration is also desired to enlarge the reconstructed area.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!