Publications by authors named "Alan Bovik"

We study the visual quality judgments of human subjects on digital human avatars (sometimes referred to as "holograms" in the parlance of virtual reality [VR] and augmented reality [AR] systems) that have been subjected to distortions. We also study the ability of video quality models to predict human judgments. As streaming human avatar videos in VR or AR become increasingly common, the need for more advanced human avatar video compression protocols will be required to address the tradeoffs between faithfully transmitting high-quality visual representations while adjusting to changeable bandwidth scenarios.

View Article and Find Full Text PDF

High Dynamic Range (HDR) videos are able to represent wider ranges of contrasts and colors than Standard Dynamic Range (SDR) videos, giving more vivid experiences. Due to this, HDR videos are expected to grow into the dominant video modality of the future. However, HDR videos are incompatible with existing SDR displays, which form the majority of affordable consumer displays on the market.

View Article and Find Full Text PDF

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure.

View Article and Find Full Text PDF

Despite acceleration in the use of 3D meshes, it is difficult to find effective mesh quality assessment algorithms that can produce predictions highly correlated with human subjective opinions. Defining mesh quality features is challenging due to the irregular topology of meshes, which are defined on vertices and triangles. To address this, we propose a novel 3D projective structural similarity index ( 3D- PSSIM) for meshes that is robust to differences in mesh topology.

View Article and Find Full Text PDF

We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions, and among many other findings, observed that HDR videos were often rated as lower quality than SDR videos at lower bitrates, particularly when viewed on LCD and QLED displays.

View Article and Find Full Text PDF

The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines.

View Article and Find Full Text PDF

Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued representations, achieving better generalization capability on perceptual feature learning than real-valued ones.

View Article and Find Full Text PDF

As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content. Perceptual quality models are used to monitor and control the compression of streamed SDR content.

View Article and Find Full Text PDF

Perioperative morbidity and mortality are significantly associated with both static and dynamic perioperative factors. The studies investigating static perioperative factors have been reported; however, there are a limited number of previous studies and data sets analyzing dynamic perioperative factors, including physiologic waveforms, despite its clinical importance. To fill the gap, the authors introduce a novel large size perioperative data set: Machine Learning Of physiologic waveforms and electronic health Record Data (MLORD) data set.

View Article and Find Full Text PDF

Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information.

View Article and Find Full Text PDF

Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise.

View Article and Find Full Text PDF

We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important.

View Article and Find Full Text PDF

Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced.

View Article and Find Full Text PDF

Previous blind or No Reference (NR) Image / video quality assessment (IQA/VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and the concept of space-variant quality assessment is of interest, given the availability of increasingly high spatial and temporal resolution contents and practical ways of measuring gaze direction.

View Article and Find Full Text PDF

We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.

View Article and Find Full Text PDF

Being able to accurately predict the visual quality of videos subjected to various combinations of dimension reduction protocols is of high interest to the streaming video industry, given rapid increases in frame resolutions and frame rates. In this direction, we have developed a video quality predictor that is sensitive to spatial, temporal, or space-time subsampling combined with compression. Our predictor is based on new models of space-time natural video statistics (NVS).

View Article and Find Full Text PDF

Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers.

View Article and Find Full Text PDF

Video livestreaming is gaining prevalence among video streaming service s, especially for the delivery of live, high motion content such as sport ing events. The quality of the se livestreaming videos can be adversely affected by any of a wide variety of events, including capture artifacts, and distortions incurred during coding and transmission. High motion content can cause or exacerbate many kinds of distortion, such as motion blur and stutter.

View Article and Find Full Text PDF

We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that implicitly capture motion.

View Article and Find Full Text PDF

Because of the increasing ease of video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) videos to social and streaming media sites over the Internet. UGC videos are commonly captured by naive users having limited skills and imperfect techniques, and tend to be afflicted by mixtures of highly diverse in-capture distortions. These UGC videos are then often uploaded for sharing onto cloud servers, where they are further compressed for storage and transmission.

View Article and Find Full Text PDF

We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients.

View Article and Find Full Text PDF

It is well known that natural images possess statistical regularities that can be captured by bandpass decomposition and divisive normalization processes that approximate early neural processing in the human visual system. We expand on these studies and present new findings on the properties of space-time natural statistics that are inherent in motion pictures. Our model relies on the concept of temporal bandpass (e.

View Article and Find Full Text PDF

In Virtual Reality (VR), the requirements of much higher resolution and smooth viewing experiences under rapid and often real-time changes in viewing direction, leads to significant challenges in compression and communication. To reduce the stresses of very high bandwidth consumption, the concept of foveated video compression is being accorded renewed interest. By exploiting the space-variant property of retinal visual acuity, foveation has the potential to substantially reduce video resolution in the visual periphery, with hardly noticeable perceptual quality degradations.

View Article and Find Full Text PDF

Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and objective QoE prediction models. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device.

View Article and Find Full Text PDF