Publications by authors named "Munchurl Kim"

Recently, attempts to learn the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion have drawn much attention. One of the most challenging aspects of this task is to handle independently moving objects as they break the rigid-scene assumption. In this paper, we show for the first time that pixel positional information can be exploited to learn SVDE (Single View Depth Estimation) from videos.

View Article and Find Full Text PDF

We propose a novel two-stage training strategy with ambiguity boosting for the self-supervised learning of single view depths from stereo images. Our proposed two-stage learning strategy first aims to obtain a coarse depth prior by training an auto-encoder network for a stereoscopic view synthesis task. This prior knowledge is then boosted and used to self-supervise the model in the second stage of training in our novel ambiguity boosting loss.

View Article and Find Full Text PDF

Joint exploration model (JEM) reference codecs of ISO/IEC and ITU-T utilize multiple types of integer transforms based on DCT and DST of various transform sizes for intra- and inter-predictive coding, which has brought a significant improvement in coding efficiency. JEM adopts three types of integer DCTs (DCT-II, DCT-V, and DCT-VIII), and two types of integer DSTs (DST-I and DST-VII). The fast computations of Integer DCT-II and DST-I are well known, but few studies have been performed for the other types such as DCT-V, DCT-VIII, and DST-VII for all transform sizes.

View Article and Find Full Text PDF

We present a novel and effective learning-based frame rate upconversion (FRUC) scheme, using linear mapping. The proposed learning-based FRUC scheme consists of: 1) a new hierarchical extended bilateral motion estimation (HEBME) method; 2) a light-weight motion deblur (LWMD) method; and 3) a synthesis-based motion-compensated frame interpolation (S-MCFI) method. First, the HEBME method considerably enhances the accuracy of the motion estimation (ME), which can lead to a significant improvement of the FRUC performance.

View Article and Find Full Text PDF

Conventional predictive video coding-based approaches are reaching the limit of their potential coding efficiency improvements, because of severely increasing computation complexity. As an alternative approach, perceptual video coding (PVC) has attempted to achieve high coding efficiency by eliminating perceptual redundancy, using just-noticeable-distortion (JND) directed PVC. The previous JNDs were modeled by adding white Gaussian noise or specific signal patterns into the original images, which were not appropriate in finding JND thresholds due to distortion with energy reduction.

View Article and Find Full Text PDF

Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity.

View Article and Find Full Text PDF

In this paper, a low complexity coding unit (CU)-level rate and distortion estimation scheme is proposed for High Efficiency Video Coding (HEVC) hardware-friendly implementation where a Walsh-Hadamard transform (WHT)-based low-complexity integer discrete cosine transform (DCT) is employed for distortion estimation. Since HEVC adopts quadtree structures of coding blocks with hierarchical coding depths, it becomes more difficult to estimate accurate rate and distortion values without actually performing transform, quantization, inverse transform, de-quantization, and entropy coding. Furthermore, DCT for rate-distortion optimization (RDO) is computationally high, because it requires a number of multiplication and addition operations for various transform block sizes of 4-, 8-, 16-, and 32-orders and requires recursive computations to decide the optimal depths of CU or transform unit.

View Article and Find Full Text PDF

Discrete Cosine Transform (DCT)-based just noticeable difference (JND) profiles have widely been applied into human perception-based video coding in order to reduce perceptual redundancy, which is one of the main goals of perceptual video coding (PVC). However, there are two problems for this approach: 1) the JND value of each transform coefficient is estimated for a fixed-sized DCT kernel (e.g.

View Article and Find Full Text PDF

Computational models for image quality assessment (IQA) have been developed by exploring effective features that are consistent with the characteristics of a human visual system (HVS) for visual quality perception. In this paper, we first reveal that many existing features used in computational IQA methods can hardly characterize visual quality perception for local image characteristics and various distortion types. To solve this problem, we propose a new IQA method, called the structural contrast-quality index (SC-QI), by adopting a structural contrast index (SCI), which can well characterize local and global visual quality perceptions for various image characteristics with structural-distortion types.

View Article and Find Full Text PDF

With the advent of ultrahigh-definition (UHD) video services, super-resolution (SR) techniques are often required to generate high-resolution (HR) images from low-resolution (LR) images, such as HD images. To generate such HR images and a video of UHD resolutions in limited computing devices with hardware and software, low complex but excellent SR methods are particularly required. In this paper, we present a novel and fast SR method, called super-interpolation (SI), by unifying an interpolation step and a quality-enhancement step.

View Article and Find Full Text PDF

Social TV is a social media service via TV and social networks through which TV users exchange their experiences about TV programs that they are viewing. For social TV service, two technical aspects are envisioned: grouping of similar TV users to create social TV communities and recommending TV programs based on group and personal interests for personalizing TV. In this paper, we propose a unified topic model based on grouping of similar TV users and recommending TV programs as a social TV service.

View Article and Find Full Text PDF

In this paper, we propose a new DCT-based just noticeable difference (JND) profile incorporating the spatial contrast sensitivity function, the luminance adaptation effect, and the contrast masking (CM) effect. The proposed JND profile overcomes two limitations of conventional JND profiles: 1) the CM JND models in the conventional JND profiles employed simple texture complexity metrics, which are not often highly correlated with perceived complexity, especially for unstructured patterns. So, we proposed a new texture complexity metric that considers not only contrast intensity, but also structureness of image patterns, called the structural contrast index.

View Article and Find Full Text PDF