Publications by authors named "Xiaochun Cao"

Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we analyze 519 papers published on this topic during 2017-2023, examine the suitability of model architectures for different tasks, summarize their resource consumption, and look at the generalization ability of models on different datasets. We examine three representative pre-trained models for code: CodeBERT, CodeGPT, and CodeT5, and conduct experiments on the four topmost targeted software engineering tasks from the literature: Bug Fixing, Bug Detection, Code Summarization, and Code Search.

View Article and Find Full Text PDF

Scene text editing aims to replace the source text with the target text while preserving the original background. Its practical applications span various domains, such as data generation and privacy protection, highlighting its increasing importance in recent years. In this study, we propose a novel Scene Text Editing network with Explicitly-decoupled text transfer and Minimized background reconstruction, called STEEM.

View Article and Find Full Text PDF

Camouflaged object detection (COD) aims to identify the objects that seamlessly blend into the surrounding backgrounds. Due to the intrinsic similarity between the camouflaged objects and the background region, it is extremely challenging to precisely distinguish the camouflaged objects by existing approaches. In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.

View Article and Find Full Text PDF

Object detection methods have achieved remarkable performances when the training and testing data satisfy the assumption of i.i.d.

View Article and Find Full Text PDF

Inertial measurement units (IMU) in the capturing device can record the motion information of the device, with gyroscopes measuring angular velocity and accelerometers measuring acceleration. However, conventional deblurring methods seldom incorporate IMU data, and existing approaches that utilize IMU information often face challenges in fully leveraging this valuable data, resulting in noise issues from the sensors. To address these issues, in this paper, we propose a multi-stage deblurring network named INformer, which combines inertial information with the Transformer architecture.

View Article and Find Full Text PDF

Image restoration aims to reconstruct a high-quality image from its corrupted version, playing essential roles in many scenarios. Recent years have witnessed a paradigm shift in image restoration from convolutional neural networks (CNNs) to Transformer-based models due to their powerful ability to model long-range pixel interactions. In this paper, we explore the potential of CNNs for image restoration and show that the proposed simple convolutional network architecture, termed ConvIR, can perform on par with or better than the Transformer counterparts.

View Article and Find Full Text PDF

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical.

View Article and Find Full Text PDF

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests.

View Article and Find Full Text PDF

A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e.

View Article and Find Full Text PDF

Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces robustness after a few training epochs. Although various FAT variants have been proposed to prevent overfitting, they require high training time.

View Article and Find Full Text PDF

Optical aberration is a ubiquitous degeneration in realistic lens-based imaging systems. Optical aberrations are caused by the differences in the optical path length when light travels through different regions of the camera lens with different incident angles. The blur and chromatic aberrations manifest significant discrepancies when the optical system changes.

View Article and Find Full Text PDF

Objectives: To study the efficacy and safety of Xiyanping injection through intramuscular injection for the treatment of acute bronchitis in children.

Methods: A prospective study was conducted from December 2021 to October 2022, including 78 children with acute bronchitis from three hospitals using a multicenter, randomized, parallel-controlled design. The participants were divided into a test group (conventional treatment plus Xiyanping injection; =36) and a control group (conventional treatment alone; =37) in a 1:1 ratio.

View Article and Find Full Text PDF

Image restoration aims to reconstruct the latent sharp image from its corrupted counterpart. Besides dealing with this long-standing task in the spatial domain, a few approaches seek solutions in the frequency domain by considering the large discrepancy between spectra of sharp/degraded image pairs. However, these algorithms commonly utilize transformation tools, e.

View Article and Find Full Text PDF

Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations.

View Article and Find Full Text PDF

Context modeling or multi-level feature fusion methods have been proved to be effective in improving semantic segmentation performance. However, they are not specialized to deal with the problems of pixel-context mismatch and spatial feature misalignment, and the high computational complexity hinders their widespread application in real-time scenarios. In this work, we propose a lightweight Context and Spatial Feature Calibration Network (CSFCN) to address the above issues with pooling-based and sampling-based attention mechanisms.

View Article and Find Full Text PDF

Positive-Unlabeled (PU) data arise frequently in a wide range of fields such as medical diagnosis, anomaly analysis and personalized advertising. The absence of any known negative labels makes it very challenging to learn binary classifiers from such data. Many state-of-the-art methods reformulate the original classification risk with individual risks over positive and unlabeled data, and explicitly minimize the risk of classifying unlabeled data as negative.

View Article and Find Full Text PDF

Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.

View Article and Find Full Text PDF

The Area Under the ROC curve (AUC) is a crucial metric for machine learning, which is often a reasonable choice for applications like disease prediction and fraud detection where the datasets often exhibit a long-tail nature. However, most of the existing AUC-oriented learning methods assume that the training data and test data are drawn from the same distribution. How to deal with domain shift remains widely open.

View Article and Find Full Text PDF

The Area Under the ROC curve (AUC) is a popular metric for long-tail classification. Many efforts have been devoted to AUC optimization methods in the past decades. However, little exploration has been done to make them survive adversarial attacks.

View Article and Find Full Text PDF

Sketch classification models have been extensively investigated by designing a task-driven deep neural network. Despite their successful performances, few works have attempted to explain the prediction of sketch classifiers. To explain the prediction of classifiers, an intuitive way is to visualize the activation maps via computing the gradients.

View Article and Find Full Text PDF

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets and define two real practical instance-level retrieval tasks that enable evaluations on price comparison and personalized recommendations. For both instance-level tasks, accurately identifying the intended product target mentioned in visual-linguistic data and mitigating the impact of irrelevant content are quite challenging.

View Article and Find Full Text PDF

Current 3D mesh steganography algorithms relying on geometric modification are prone to detection by steganalyzers. In traditional steganography, adaptive steganography has proven to be an efficient means of enhancing steganography security. Taking inspiration from this, we propose a highly adaptive embedding algorithm, guided by the principle of minimizing a carefully crafted distortion through efficient steganography codes.

View Article and Find Full Text PDF

Restoring missing areas without leaving visible traces has become a trivial task with Photoshop inpainting tools. However, such tools have potentially illegal or unethical uses, such as removing specific objects in images to deceive the public. Despite the emergence of many forensics methods of image inpainting, their detection ability is still insufficient when attending to professional Photoshop inpainting.

View Article and Find Full Text PDF

In this paper, we address the problem of video-based rain streak removal by developing an event-aware multi-patch progressive neural network. Rain streaks in video exhibit correlations in both temporal and spatial dimensions. Existing methods have difficulties in modeling the characteristics.

View Article and Find Full Text PDF

Human-object relationship detection reveals the fine-grained relationship between humans and objects, helping the comprehensive understanding of videos. Previous human-object relationship detection approaches are mainly developed with object features and relation features without exploring the specific information of humans. In this paper, we propose a novel Relation-Pose Transformer (RPT) for human-object relationship detection.

View Article and Find Full Text PDF