Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TVCG.2024.3365804DOI Listing

Publication Analysis

Top Keywords

video generation
20
customized video
8
video
7
generation
5
make-your-video customized
4
generation textual
4
textual structural
4
structural guidance
4
guidance creating
4
creating vivid
4

Similar Publications

In order to solve the limitations of flipped classroom in personalized teaching and interactive effect improvement, this paper designs a new model of flipped classroom in colleges and universities based on Virtual Reality (VR) by combining the algorithm of Contrastive Language-Image Pre-Training (CLIP). Through cross-modal data fusion, the model deeply combines students' operation behavior with teaching content, and improves teaching effect through intelligent feedback mechanism. The test data shows that the similarity between video and image modes reaches 0.

View Article and Find Full Text PDF

Immersive exposure to simulated visual hallucinations modulates high-level human cognition.

Conscious Cogn

January 2025

Humane Technology Lab, Catholic University of Sacred Heart, Milan, Italy; Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano IRCCS, Milan, Italy. Electronic address:

Psychedelic drugs offer valuable insights into consciousness, but disentangling their causal effects on perceptual and high-level cognition is nontrivial. Technological advances in virtual reality (VR) and machine learning have enabled the immersive simulation of visual hallucinations. However, comprehensive experimental data on how these simulated hallucinations affects high-level human cognition is lacking.

View Article and Find Full Text PDF

Background: To explore the utility of general movements assessment as a predictive tool of the neurological outcome in term-born infants with hypoxic-ischemic encephalopathy (HIE) at ages six and 12 months.

Methods: This prospective observational study was conducted for 18 months (August 2018 to December 2019). Term-born newborns with HIE were included.

View Article and Find Full Text PDF

A Comprehensive Analysis of a Social Intelligence Dataset and Response Tendencies Between Large Language Models (LLMs) and Humans.

Sensors (Basel)

January 2025

Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.

In recent years, advancements in the interaction and collaboration between humans and have garnered significant attention. Social intelligence plays a crucial role in facilitating natural interactions and seamless communication between humans and Artificial Intelligence (AI). To assess AI's ability to understand human interactions and the components necessary for such comprehension, datasets like Social-IQ have been developed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!