Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting, and https://github.com/shwoo93/video_decaptioning.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2019.2958083DOI Listing

Publication Analysis

Top Keywords

video inpainting
16
video
9
recurrent temporal
8
temporal aggregation
8
aggregation framework
8
deep video
8
video decaptioning
8
inpainting
6
framework
4
framework deep
4

Similar Publications

Article Synopsis
  • Optimizing the educational experience in otologic surgery is crucial due to the complexities of ear anatomy and challenges in video quality during surgical viewings.* -
  • The study aimed to enhance the quality of tympanomastoidectomy surgical videos using AI techniques and assessed their effectiveness through trainee feedback.* -
  • Results indicated that AI-enhanced videos significantly aided trainees in understanding procedures, especially for those with less experience, making surgical education more effective.*
View Article and Find Full Text PDF
Article Synopsis
  • The text discusses a new framework called BioM3 that allows for the design of proteins using natural language prompts, integrating text and protein representation in a novel way.
  • This framework operates in three stages: aligning protein and text representations, refining text embeddings, and generating protein sequences using a specific model.
  • BioM3 has shown impressive results in various protein-related tasks and successfully generates proteins with characteristics similar to naturally occurring ones, validated through experimental tests.
View Article and Find Full Text PDF

Low-rank tensor completion (LRTC) has shown promise in processing incomplete visual data, yet it often overlooks the inherent local smooth structures in images and videos. Recent advances in LRTC, integrating total variation regularization to capitalize on the local smoothness, have yielded notable improvements. Nonetheless, these methods are limited to exploiting local smoothness within the original data space, neglecting the latent factor space of tensors.

View Article and Find Full Text PDF

Instance shadow detection, crucial for applications such as photo editing and light direction estimation, has undergone significant advancements in predicting shadow instances, object instances, and their associations. The extension of this task to videos presents challenges in annotating diverse video data and addressing complexities arising from occlusion and temporary disappearances within associations. In response to these challenges, we introduce ViShadow, a semi-supervised video instance shadow detection framework that leverages both labeled image data and unlabeled video data for training.

View Article and Find Full Text PDF

Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks, e.g., image and video recovery.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!