Depth perception capability is one of the essential requirements for various autonomous driving platforms. However, accurate depth estimation in a real-world setting is still a challenging problem due to high computational costs. In this paper, we propose a lightweight depth completion network for depth perception in real-world environments.
View Article and Find Full Text PDFIEEE Trans Image Process
August 2022
A holistic understanding of dynamic scenes is of fundamental importance in real-world computer vision problems such as autonomous driving, augmented reality and spatio-temporal reasoning. In this paper, we propose a new computer vision benchmark: Video Panoptic Segmentation (VPS). To study this important problem, we present two datasets, Cityscapes-VPS and VIPER together with a new evaluation metric, video panoptic quality (VPQ).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
November 2023
Recent state-of-the-art active learning methods have mostly leveraged generative adversarial networks (GANs) for sample acquisition; however, GAN is usually known to suffer from instability and sensitivity to hyperparameters. In contrast to these methods, in this article, we propose a novel active learning framework that we call Maximum Classifier Discrepancy for Active Learning (MCDAL) that takes the prediction discrepancies between multiple classifiers. In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
View Article and Find Full Text PDFWith the emerging interest of autonomous vehicles (AV), the performance and reliability of the land vehicle navigation are also becoming important. Generally, the navigation system for passenger car has been heavily relied on the existing Global Navigation Satellite System (GNSS) in recent decades. However, there are many cases in real world driving where the satellite signals are challenged; for example, urban streets with buildings, tunnels, or even underpasses.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2022
We introduce dense relational captioning, a novel image captioning task which aims to generate multiple captions with respect to relational information between objects in a visual scene. Relational captioning provides explicit descriptions for each relationship between object combinations. This framework is advantageous in both diversity and amount of information, leading to a comprehensive image understanding based on relationships, e.
View Article and Find Full Text PDFIEEE Trans Image Process
December 2021
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples, resulting in training sets with a long-tailed distribution. The lack of positive labels can lead to low classification accuracy for these classes. Towards addressing this issue, we observe that there exist natural correlations and anti-correlations among human-object interactions.
View Article and Find Full Text PDFWe propose a new linear RGB-D simultaneous localization and mapping (SLAM) formulation by utilizing planar features of the structured environments. The key idea is to understand a given structured scene and exploit its structural regularities such as the Manhattan world. This understanding allows us to decouple the camera rotation by tracking structural regularities, which makes SLAM problems free from being highly nonlinear.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2023
With the increasing social demands of disaster response, methods of visual observation for rescue and safety have become increasingly important. However, because of the shortage of datasets for disaster scenarios, there has been little progress in computer vision and robotics in this field. With this in mind, we present the first large-scale synthetic dataset of egocentric viewpoints for disaster scenarios.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2022
Taking selfies has become one of the major photographic trends of our time. In this study, we focus on the selfie stick, on which a camera is mounted to take selfies. We observe that a camera on a selfie stick typically travels through a particular type of trajectory around a sphere.
View Article and Find Full Text PDFThe magnetic particle imaging (MPI) is a technology that can image the concentrations of the superparamagnetic iron oxide nanoparticles (SPIONs) which can be used in biomedical diagnostics and therapeutics as non-radioactive tracers. We proposed a point-of-care testing MPI system (PoCT-MPI) that can be used for preclinical use for imaging small rodents (mice) injected with SPIONs not only in laboratories, but also at emergency sites far from laboratories. In particular, we applied a frequency mixing magnetic detection method to the PoCT-MPI, and proposed a hybrid field free line generator to reduce the power consumption, size and weight of the system.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2020
Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
May 2021
Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsupervised algorithm to address the problem of localizing sound sources in visual scenes. In order to achieve this goal, a two-stream network structure which handles each modality with attention mechanism is developed for sound source localization.
View Article and Find Full Text PDFWe propose a novel approach to infer a high-quality depth map from a set of images with small viewpoint variations. In general, techniques for depth estimation from small motion consist of camera pose estimation and dense reconstruction. In contrast to prior approaches that recover scene geometry and camera motions using pre-calibrated cameras, we introduce in this paper a self-calibrating bundle adjustment method tailored for small motion which enables computation of camera poses without the need for camera calibration.
View Article and Find Full Text PDFIEEE Trans Image Process
August 2019
Depth from focus (DfF) is a method of estimating the depth of a scene by using information acquired through changes in the focus of a camera. Within the DfF framework of, the focus measure (FM) forms the foundation which determines the accuracy of the output. With the results from the FM, the role of a DfF pipeline is to determine and recalculate unreliable measurements while enhancing those that are reliable.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2020
In this work, we describe man-made structures via an appropriate structure assumption, called the Atlanta world assumption, which contains a vertical direction (typically the gravity direction) and a set of horizontal directions orthogonal to the vertical direction. Contrary to the commonly used Manhattan world assumption, the horizontal directions in Atlanta world are not necessarily orthogonal to each other. While Atlanta world can encompass a wider range of scenes, this makes the search space much larger and the problem more challenging.
View Article and Find Full Text PDFThis paper presents a depth upsampling method that produces a high-fidelity dense depth map using a high-resolution RGB image and LiDAR sensor data. Our proposed method explicitly handles depth outliers and computes a depth upsampling with confidence information. Our key idea is the self-learning framework, which automatically learns to estimate the reliability of the upsampled depth map without human-labeled annotation.
View Article and Find Full Text PDFAs the computing power of hand-held devices grows, there has been increasing interest in the capture of depth information, to enable a variety of photographic applications. However, under low-light conditions, most devices still suffer from low imaging quality and inaccurate depth acquisition. To address the problem, we present a robust depth estimation method from a short burst shot with varied intensity (i.
View Article and Find Full Text PDFIEEE Trans Image Process
March 2019
We propose a deep convolutional neural network (CNN) method for natural image matting. Our method takes multiple initial alpha mattes of the previous methods and normalized RGB color images as inputs, and directly learns an end-to-end mapping between the inputs and reconstructed alpha mattes. Among the various existing methods, we focus on using two simple methods as initial alpha mattes: the closed-form matting and KNN matting.
View Article and Find Full Text PDFWhile conventional calibrated photometric stereo methods assume that light intensities and sensor exposures are known or unknown but identical across observed images, this assumption easily breaks down in practical settings due to individual light bulb's characteristics and limited control over sensors. This paper studies the effect of unknown and possibly non-uniform light intensities and sensor exposures among observed images on the shape recovery based on photometric stereo. This leads to the development of a "semi-calibrated" photometric stereo method, where the light directions are known but light intensities (and sensor exposures) are unknown.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
February 2019
One of the core applications of light field imaging is depth estimation. To acquire a depth map, existing approaches apply a single photo-consistency measure to an entire light field. However, this is not an optimal choice because of the non-uniform light field degradations produced by limitations in the hardware design.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2019
Structure from small motion has become an important topic in 3D computer vision as a method for estimating depth, since capturing the input is so user-friendly. However, major limitations exist with respect to the form of depth uncertainty, due to the narrow baseline and the rolling shutter effect. In this paper, we present a dense 3D reconstruction method from small motion clips using commercial hand-held cameras, which typically cause the undesired rolling shutter artifact.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
March 2019
Most man-made environments, such as urban and indoor scenes, consist of a set of parallel and orthogonal planar structures. These structures are approximated by the Manhattan world assumption, in which notion can be represented as a Manhattan frame (MF). Given a set of inputs such as surface normals or vanishing points, we pose an MF estimation problem as a consensus set maximization that maximizes the number of inliers over the rotation search space.
View Article and Find Full Text PDFCalibration is vital to autostereoscopic 3D displays. This paper proposes a local calibration method that copes with any type of deformation in the optical layer. The proposed method is based on visual pattern analysis.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
February 2018
Rank minimization can be converted into tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted NNM (WNNM). The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), with which we avoid direct computation of SVD.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2017
We present a novel coded exposure video technique for multi-image motion deblurring. The key idea of this paper is to capture video frames with a set of complementary fluttering patterns, which enables us to preserve all spectrum bands of a latent image and recover a sharp latent image. To achieve this, we introduce an algorithm for generating a complementary set of binary sequences based on the modern communication theory and implement the coded exposure video system with an off-the-shelf machine vision camera.
View Article and Find Full Text PDF