Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance.

Jinbo Xing Menghan Xia Yuxin Liu Yuechen Zhang Yong Zhang Yingqing He Hanyuan Liu Haoxin Chen Xiaodong Cun Xintao Wang Ying Shan Tien-Tsin Wong

IEEE Trans Vis Comput Graph

Published: February 2024

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TVCG.2024.3365804	DOI Listing

Publication Analysis

Top Keywords

video generation

customized video

video

generation

make-your-video customized

generation textual

textual structural

structural guidance

guidance creating

creating vivid

Similar Publications

Improvement of flipped classroom teaching in colleges and universities based on virtual reality assisted by deep learning.

Sci Rep

January 2025

School of Electronic and Information Engineering, Changsha Institute of Technology, Changsha, 410200, China.

Wenxia Dai Qinqing Kang

In order to solve the limitations of flipped classroom in personalized teaching and interactive effect improvement, this paper designs a new model of flipped classroom in colleges and universities based on Virtual Reality (VR) by combining the algorithm of Contrastive Language-Image Pre-Training (CLIP). Through cross-modal data fusion, the model deeply combines students' operation behavior with teaching content, and improves teaching effect through intelligent feedback mechanism. The test data shows that the similarity between video and image modes reaches 0.

View Article and Find Full Text PDF

Similar Publications

Immersive exposure to simulated visual hallucinations modulates high-level human cognition.

Conscious Cogn

January 2025

Humane Technology Lab, Catholic University of Sacred Heart, Milan, Italy; Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano IRCCS, Milan, Italy. Electronic address:

Antonino Greco Clara Rastelli Andrea Ubaldi Giuseppe Riva

Psychedelic drugs offer valuable insights into consciousness, but disentangling their causal effects on perceptual and high-level cognition is nontrivial. Technological advances in virtual reality (VR) and machine learning have enabled the immersive simulation of visual hallucinations. However, comprehensive experimental data on how these simulated hallucinations affects high-level human cognition is lacking.

View Article and Find Full Text PDF

Similar Publications

General Movements as Predictive Tool of Neurological Outcomes in Term-Born Infants With Hypoxic-Ischemic Encephalopathy at Ages Six and 12 Months.

Pediatr Neurol

January 2025

Department of Pediatrics, Postgraduate Institute of Medical Education and Research, Chandigarh, India.

Rudresh Naik Lokesh Saini Christa Einspieler Pradeep Kumar Gunasekaran Kanya Mukhopadhyay

Background: To explore the utility of general movements assessment as a predictive tool of the neurological outcome in term-born infants with hypoxic-ischemic encephalopathy (HIE) at ages six and 12 months.

Methods: This prospective observational study was conducted for 18 months (August 2018 to December 2019). Term-born newborns with HIE were included.

View Article and Find Full Text PDF

Similar Publications

Generalized epileptic discharges leading into focal onset seizure: GOFE seizures as the initial diagnosis of epilepsy.

J Neurol

January 2025

Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina, 60, 20132, Milan, Italy.

Davide Gusmeo Curti Anna Bellini Marco Cursi Jacopo Lanzone Gianni Cutillo

View Article and Find Full Text PDF

Similar Publications

A Comprehensive Analysis of a Social Intelligence Dataset and Response Tendencies Between Large Language Models (LLMs) and Humans.

Sensors (Basel)

January 2025

Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.

Erika Mori Yue Qiu Hirokatsu Kataoka Yoshimitsu Aoki

In recent years, advancements in the interaction and collaboration between humans and have garnered significant attention. Social intelligence plays a crucial role in facilitating natural interactions and seamless communication between humans and Artificial Intelligence (AI). To assess AI's ability to understand human interactions and the components necessary for such comprehension, datasets like Social-IQ have been developed.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!