Timely and effective diagnosis of fungal keratitis (FK) is necessary for suitable treatment and avoiding irreversible vision loss for patients. In vivo confocal microscopy (IVCM) has been widely adopted to guide the FK diagnosis. We present a deep learning framework for diagnosing fungal keratitis using IVCM images to assist ophthalmologists.
View Article and Find Full Text PDFIEEE Trans Image Process
July 2024
2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images "as they are", i.e.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2024
Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry space, such as grass, leaves, and fabrics, which cannot be effectively modeled using only 2D image textures.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
February 2024
Most of the existing 3D talking face synthesis methods suffer from the lack of detailed facial expressions and realistic head poses, resulting in unsatisfactory experiences for users. In this paper, we propose a novel pose-aware 3D talking face synthesis method with a novel geometry-guided audio-vertices attention. To capture more detailed expression, such as the subtle nuances of mouth shape and eye movement, we propose to build hierarchical audio features including a global attribute feature and a series of vertex-wise local latent movement features.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
January 2024
As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
July 2023
3D indoor scenes are widely used in computer graphics, with applications ranging from interior design to gaming to virtual and augmented reality. They also contain rich information, including room layout, as well as furniture type, geometry, and placement. High-quality 3D indoor scenes are highly demanded while it requires expertise and is time-consuming to design high-quality 3D indoor scenes manually.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2023
Neural Radiance Fields (NeRFs) have shown great potential for tasks like novel view synthesis of static 3D scenes. Since NeRFs are trained on a large number of input images, it is not trivial to change their content afterwards. Previous methods to modify NeRFs provide some control but they do not support direct shape deformation which is common for geometry representations like triangle meshes.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2023
We present a novel method for single-view 3D reconstruction of textured meshes, with a focus to address the primary challenge surrounding texture inference and transfer. Our key observation is that learning textured reconstruction in a structure-aware and globally consistent manner is effective in handling the severe ill-posedness of the texturing problem and significant variations in object pose and texture details. Specifically, we perform structured mesh reconstruction, via a retrieval-and-assembly approach, to produce a set of genus-zero parts parameterized by deformable boxes and endowed with semantic information.
View Article and Find Full Text PDFBenefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.
View Article and Find Full Text PDFChronic Glaucoma is an eye disease with progressive optic nerve damage. It is the second leading cause of blindness after cataract and the first leading cause of irreversible blindness. Glaucoma forecast can predict future eye state of a patient by analyzing the historical fundus images, which is helpful for early detection and intervention of potential patients and avoiding the outcome of blindness.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
July 2023
The recently proposed neural radiance fields (NeRF) use a continuous function formulated as a multi-layer perceptron (MLP) to model the appearance and geometry of a 3D scene. This enables realistic synthesis of novel views, even for scenes with view dependent appearance. Many follow-up works have since extended NeRFs in different ways.
View Article and Find Full Text PDFExisting multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture. However, low-resolution human objects are ubiquitous due to trade-off between the field of view and target distance given a limited camera resolution. In this paper, we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image (MILI).
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
December 2023
Accurately estimating the human inner-body under clothing is very important for body measurement, virtual try-on and VR/AR applications. In this article, we propose the first method to allow everyone to easily reconstruct their own 3D inner-body under daily clothing from a self-captured video with the mean reconstruction error of 0.73cm within 15s.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2023
We propose a new method for realistic human motion transfer using a generative adversarial network (GAN), which generates a motion video of a target character imitating actions of a source character, while maintaining high authenticity of the generated results. We tackle the problem by decoupling and recombining the posture information and appearance information of both the source and target characters. The innovation of our approach lies in the use of the projection of a reconstructed 3D human model as the condition of GAN to better maintain the structural integrity of transfer results in different poses.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2022
Sketch-based image retrieval (SBIR) is a long-standing research topic in computer vision. Existing methods mainly focus on category-level or instance-level image retrieval. This paper investigates the fine-grained scene-level SBIR problem where a free-hand sketch depicting a scene is used to retrieve desired images.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
September 2023
Reflectional symmetry is a ubiquitous pattern in nature. Previous works usually solve this problem by voting or sampling, suffering from high computational cost and randomness. In this article, we propose a learning-based approach to intrinsic reflectional symmetry detection.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
February 2023
Pose transfer of human videos aims to generate a high-fidelity video of a target person imitating actions of a source person. A few studies have made great progress either through image translation with deep latent features or neural rendering with explicit 3D features. However, both of them rely on large amounts of training data to generate realistic results, and the performance degrades on more accessible Internet videos due to insufficient training frames.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2023
Face portrait line drawing is a unique style of art which is highly abstract and expressive. However, due to its high semantic constraints, many existing methods learn to generate portrait drawings using paired training data, which is costly and time-consuming to obtain. In this paper, we propose a novel method to automatically transform face photos to portrait drawings using unpaired training data with two new features; i.
View Article and Find Full Text PDFColoring line art images based on the colors of reference images is a crucial stage in animation production, which is time-consuming and tedious. This paper proposes a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal refinement network based on 3U-net.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
April 2023
Caricature is a type of artistic style of human faces that attracts considerable attention in the entertainment industry. So far a few 3D caricature generation methods exist and all of them require some caricature information (e.g.
View Article and Find Full Text PDFIEEE Trans Image Process
October 2021
Given an input face photo, the goal of caricature generation is to produce stylized, exaggerated caricatures that share the same identity as the photo. It requires simultaneous style transfer and shape exaggeration with rich diversity, and meanwhile preserving the identity of the input. To address this challenging problem, we propose a novel framework called Multi-Warping GAN (MW-GAN), including a style network and a geometric network that are designed to conduct style transfer and geometric exaggeration respectively.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
January 2022
What makes speeches effective has long been a subject for debate, and until today there is broad controversy among public speaking experts about what factors make a speech effective as well as the roles of these factors in speeches. Moreover, there is a lack of quantitative analysis methods to help understand effective speaking strategies. In this paper, we propose E-ffective, a visual analytic system allowing speaking experts and novices to analyze both the role of speech factors and their contribution in effective speeches.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
February 2023
Deformation component analysis is a fundamental problem in geometry processing and shape understanding. Existing approaches mainly extract deformation components in local regions at a similar scale while deformations of real-world objects are usually distributed in a multi-scale manner. In this article, we propose a novel method to exact multiscale deformation components automatically with a stacked attention-based autoencoder.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
December 2022
In this article, we propose a system that can automatically generate immersive and interactive virtual reality (VR) scenes by taking real-world geometric constraints into account. Our system can not only help users avoid real-world obstacles in virtual reality experiences, but also provide context-consistent contents to preserve their sense of presence. To do so, our system first identifies the positions and bounding boxes of scene objects as well as a set of interactive planes from 3D scans.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
December 2022
Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation. The encoder is designed with dense connections to strengthen feature propagation and encourage the re-use of audio features, and the decoder is integrated with an attention mechanism to adaptively recalibrate point-wise feature responses by explicitly modeling interdependencies between different neuron units.
View Article and Find Full Text PDF