Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient in conveying the overall scene context, it may be insufficient to control precisely. In this paper, we explore customized video generation by utilizing text as context description and motion structure (e.g. frame- wise depth) as concrete guidance. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules. This two-stage learning scheme not only reduces the computing resources required, but also improves the performance by transferring the rich concepts available in image datasets solely into video generation. Moreover, we use a simple yet effective causal attention mask strategy to enable longer video synthesis, which mitigates the potential quality degradation effectively. Experimental results show the superiority of our method over existing baselines, particularly in terms of temporal coherence and fidelity to users' guidance. In addition, our model enables several intriguing applications that demonstrate potential for practical usage. The code, model weights, and videos are publicly available at our project page: https://doubiiu.github.io/projects/Make-Your-Video/.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TVCG.2024.3365804 | DOI Listing |
Sci Rep
January 2025
School of Electronic and Information Engineering, Changsha Institute of Technology, Changsha, 410200, China.
In order to solve the limitations of flipped classroom in personalized teaching and interactive effect improvement, this paper designs a new model of flipped classroom in colleges and universities based on Virtual Reality (VR) by combining the algorithm of Contrastive Language-Image Pre-Training (CLIP). Through cross-modal data fusion, the model deeply combines students' operation behavior with teaching content, and improves teaching effect through intelligent feedback mechanism. The test data shows that the similarity between video and image modes reaches 0.
View Article and Find Full Text PDFConscious Cogn
January 2025
Humane Technology Lab, Catholic University of Sacred Heart, Milan, Italy; Applied Technology for Neuro-Psychology Lab., Istituto Auxologico Italiano IRCCS, Milan, Italy. Electronic address:
Psychedelic drugs offer valuable insights into consciousness, but disentangling their causal effects on perceptual and high-level cognition is nontrivial. Technological advances in virtual reality (VR) and machine learning have enabled the immersive simulation of visual hallucinations. However, comprehensive experimental data on how these simulated hallucinations affects high-level human cognition is lacking.
View Article and Find Full Text PDFPediatr Neurol
January 2025
Department of Pediatrics, Postgraduate Institute of Medical Education and Research, Chandigarh, India.
Background: To explore the utility of general movements assessment as a predictive tool of the neurological outcome in term-born infants with hypoxic-ischemic encephalopathy (HIE) at ages six and 12 months.
Methods: This prospective observational study was conducted for 18 months (August 2018 to December 2019). Term-born newborns with HIE were included.
J Neurol
January 2025
Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina, 60, 20132, Milan, Italy.
Sensors (Basel)
January 2025
Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.
In recent years, advancements in the interaction and collaboration between humans and have garnered significant attention. Social intelligence plays a crucial role in facilitating natural interactions and seamless communication between humans and Artificial Intelligence (AI). To assess AI's ability to understand human interactions and the components necessary for such comprehension, datasets like Social-IQ have been developed.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!