Publications by authors named "Jongyoo Kim"

Numerous task-specific variants of autoregressive networks have been developed for dance generation. Nonetheless, a severe limitation remains in that all existing algorithms can return repeated patterns for a given initial pose, which may be inferior. We examine and analyze several key challenges of previous works, and propose variations in both model architecture (namely MNET++) and training methods to address these.

View Article and Find Full Text PDF

Single-image 3-D reconstruction has long been a challenging problem. Recent deep learning approaches have been introduced to this 3-D area, but the ability to generate point clouds still remains limited due to inefficient and expensive 3-D representations, the dependency between the output and the number of model parameters, or the lack of a suitable computing operation. In this article, we present a novel deep-learning-based method to reconstruct a point cloud of an object from a single still image.

View Article and Find Full Text PDF

There has been rapid progress recently on 3D human rendering, including novel view synthesis and pose animation, based on the advances of neural radiance fields (NeRF). However, most existing methods focus on person-specific training and their training typically requires multi-view videos. This paper deals with a new challenging task - rendering novel views and novel poses for a person unseen in training, using only multiview still images as input without videos.

View Article and Find Full Text PDF

Visual saliency on stereoscopic 3D (S3D) images has been shown to be heavily influenced by image quality. Hence, this dependency is an important factor in image quality prediction, image restoration and discomfort reduction, but it is still very difficult to predict such a nonlinear relation in images. In addition, most algorithms specialized in detecting visual saliency on pristine images may unsurprisingly fail when facing distorted images.

View Article and Find Full Text PDF

Image recognition based on convolutional neural networks (CNNs) has recently been shown to deliver the state-of-the-art performance in various areas of computer vision and image processing. Nevertheless, applying a deep CNN to no-reference image quality assessment (NR-IQA) remains a challenging task due to critical obstacles, i.e.

View Article and Find Full Text PDF

Previously, no-reference (NR) stereoscopic 3D (S3D) image quality assessment (IQA) algorithms have been limited to the extraction of reliable hand-crafted features based on an understanding of the insufficiently revealed human visual system or natural scene statistics. Furthermore, compared with full-reference (FR) S3D IQA metrics, it is difficult to achieve competitive quality score predictions using the extracted features, which are not optimized with respect to human opinion. To cope with this limitation of the conventional approach, we introduce a novel deep learning scheme for NR S3D IQA in terms of local to global feature aggregation.

View Article and Find Full Text PDF

Crosstalk is one of the most severe factors affecting the perceived quality of stereoscopic 3D images. It arises from a leakage of light intensity between multiple views, as in auto-stereoscopic displays. Well-known determinants of crosstalk include the co-location contrast and disparity of the left and right images, which have been dealt with in prior studies.

View Article and Find Full Text PDF

Conventional stereoscopic 3D (S3D) displays do not provide accommodation depth cues of the 3D image or video contents being viewed. The sense of content depths is thus limited to cues supplied by motion parallax (for 3D video), stereoscopic vergence cues created by presenting left and right views to the respective eyes, and other contextual and perspective depth cues. The absence of accommodation cues can induce two kinds of accommodation vergence mismatches (AVM) at the fixation and peripheral points, which can result in severe visual discomfort.

View Article and Find Full Text PDF