Publications by authors named "Yulan Guo"

Article Synopsis
  • Cost aggregation is essential for stereo matching, and this paper introduces a Disparity Context Aggregation (DCA) module to enhance CNN-based methods by leveraging disparity classification.
  • The DCA module classifies image pixels into disparity classes, allowing for the creation of homogeneous region representations that streamline cost aggregation and improve matching accuracy.
  • Integrated into various network architectures, the DCA module operates efficiently with minimal additional requirements, and the resulting network, DCANet, demonstrates superior performance on benchmark tests.
View Article and Find Full Text PDF
Article Synopsis
  • Restoration tasks in low-level vision focus on recovering high-quality (HQ) data from low-quality (LQ) inputs, with increasing interest in unpaired methods that don't require matched datasets.
  • Diverse and unknown degradation types in real-world scenarios present challenges for these unpaired learning methods.
  • The paper introduces a degradation representation learning scheme and a framework featuring degradation-aware convolutions, resulting in two models, UnIRnet and UnPRnet, which demonstrate state-of-the-art performance in restoring images and point clouds respectively.
View Article and Find Full Text PDF

Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object's appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases.

View Article and Find Full Text PDF

Multiview clustering (MVC) can achieve more accurate results by utilizing complementary information from multiple perspectives, compared to traditional single-view methods. However, current multiview techniques require all views to be available upfront, making them inadequate for dealing with prevalent data sources that arrive as streams, such as stem cell analysis and multicamera surveillance. To address this problem, in this article, we propose a method called lifelong stream-view clustering (LSVC), which comprises an embedding anchor knowledge library and three key components, enabling the capability to perform asynchronous clustering on stream views.

View Article and Find Full Text PDF

This paper presents a 3D registration method with maximal cliques (MAC) for 3D point cloud registration (PCR). The key insight is to loosen the previous maximum clique constraint and mine more local consensus information in a graph for accurate pose hypotheses generation: 1) A compatibility graph is constructed to render the affinity relationship between initial correspondences. 2) We search for maximal cliques in the graph, each representing a consensus set.

View Article and Find Full Text PDF

Unlabelled: It is critical to develop quick, accurate, and efficient sterilization for detecting O157:H7 in order to prevent infections and outbreaks of foodborne illnesses. Herein, we established a colorimetric biosensor with sterilizing properties using copper selenide nanoparticles to detect O157:H7. The sample was mixed with magnetic nanoprobes and nanozyme probes to form a sandwich structure, and then the unbound nanozyme probes were collected by magnetic separation.

View Article and Find Full Text PDF

Novel view synthesis aims at rendering any posed images from sparse observations of the scene. Recently, neural radiance fields (NeRF) have demonstrated their effectiveness in synthesizing novel views of a bounded scene. However, most existing methods cannot be directly extended to 360° unbounded scenes where the camera orientations and scene depths are unconstrained with large variations.

View Article and Find Full Text PDF

Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g.

View Article and Find Full Text PDF

Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation.

View Article and Find Full Text PDF

Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases.

View Article and Find Full Text PDF

Today, with the globalization of the food trade progressing, food safety continues to warrant widespread attention. Foodborne diseases caused by contaminated food, including foodborne pathogens, seriously threaten public health and the economy. This has led to the development of more sensitive and accurate methods for detecting pathogenic bacteria.

View Article and Find Full Text PDF
Article Synopsis
  • Infrared small target detection (IRST) focuses on identifying targets against cluttered backgrounds, but traditional methods struggle with extremely dim targets.
  • The proposed Direction-Coded Temporal U-Shape Module (DTUM) addresses this challenge by effectively encoding motion direction, allowing better differentiation between targets and clutter in multiframe (MIRST) detection.
  • Additionally, a new dataset (NUDT-MIRSDT) was created to evaluate the performance of MIRST detection methods, showing that the DTUM achieves state-of-the-art results in detecting dim infrared targets and reducing false alarms.
View Article and Find Full Text PDF

Studies have shown that exposure to fine particulate matter (PM2.5) affects various cells, systems, and organs in vivo and in vitro. PM2.

View Article and Find Full Text PDF

Recently, memory-based methods have achieved remarkable progress in video object segmentation. However, the segmentation performance is still limited by error accumulation and redundant memory, primarily because of 1) the semantic gap caused by similarity matching and memory reading via heterogeneous key-value encoding; 2) the continuously growing and inaccurate memory through directly storing unreliable predictions of all previous frames. To address these issues, we propose an efficient, effective, and robust segmentation method based on Isogenous Memory Sampling and Frame-Relation mining (IMSFR).

View Article and Find Full Text PDF

With the development of new technologies for rapid and high-throughput bacterial detection, ATP-based bioluminescence technology is making progress. Because live bacteria contain ATP, the number of bacteria is correlated with the level of ATP under certain conditions, so that the method of luciferase catalyzing the fluorescence reaction of luciferin with ATP is widely used for the detection of bacteria. This method is easy to operate, has a short detection cycle, does not require much human resources, and is suitable for long-term continuous monitoring.

View Article and Find Full Text PDF

We study the problem of extracting accurate correspondences for point cloud registration. Recent keypoint-free methods have shown great potential through bypassing the detection of repeatable keypoints which is difficult to do especially in low-overlap scenarios. They seek correspondences over downsampled superpoints, which are then propagated to dense points.

View Article and Find Full Text PDF

We present RoReg, a novel point cloud registration framework that fully exploits oriented descriptors and estimated local rotations in the whole registration pipeline. Previous methods mainly focus on extracting rotation-invariant descriptors for registration but unanimously neglect the orientations of descriptors. In this paper, we show that the oriented descriptors and the estimated local rotations are very useful in the whole registration pipeline, including feature description, feature detection, feature matching, and transformation estimation.

View Article and Find Full Text PDF

Single-frame infrared small target (SIRST) detection aims at separating small targets from clutter backgrounds. With the advances of deep learning, CNN-based methods have yielded promising results in generic object detection due to their powerful modeling capability. However, existing CNN-based methods cannot be directly applied to infrared small targets since pooling layers in their networks could lead to the loss of targets in deep layers.

View Article and Find Full Text PDF

We present for learning robust, flexible and generalizable 3D object representations without requiring heavy annotation efforts or supervision. Unlike conventional 3D generative models, our algorithm aims for building a structured latent space where certain factors of shape variations, such as object parts, can be disentangled into independent sub-spaces. Our novel decoder then acts on these individual latent sub-spaces (i.

View Article and Find Full Text PDF

Neural networks contain considerable redundant computation, which drags down the inference efficiency and hinders the deployment on resource-limited devices. In this paper, we study the sparsity in convolutional neural networks and propose a generic sparse mask mechanism to improve the inference efficiency of networks. Specifically, sparse masks are learned in both data and channel dimensions to dynamically localize and skip redundant computation at a fine-grained level.

View Article and Find Full Text PDF

Extracting distinctive, robust, and general 3D local features is essential to downstream tasks such as point cloud registration. However, existing methods either rely on noise-sensitive handcrafted features, or depend on rotation-variant neural architectures. It remains challenging to learn robust and general local feature descriptors for surface matching.

View Article and Find Full Text PDF

The goal of ground-to-aerial image geo-localization is to determine the location of a ground query image by matching it against a reference database consisting of aerial/satellite images. This task is highly challenging due to the large appearance difference caused by extreme changes in viewpoint and orientation. In this work, we show that the training difficulty is an important cue that can be leveraged to improve metric learning on cross-view images.

View Article and Find Full Text PDF

Light field (LF) cameras record both intensity and directions of light rays, and encode 3D scenes into 4D LF images. Recently, many convolutional neural networks (CNNs) have been proposed for various LF image processing tasks. However, it is challenging for CNNs to effectively process LF images since the spatial and angular information are highly inter-twined with varying disparities.

View Article and Find Full Text PDF

It is often difficult to diagnose pituitary microadenoma (PM) by MRI alone, due to its relatively small size, variable anatomical structure, complex clinical symptoms, and signs among individuals. We develop and validate a deep learning -based system to diagnose PM from MRI. A total of 11,935 infertility participants were initially recruited for this project.

View Article and Find Full Text PDF

We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds.

View Article and Find Full Text PDF