Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to , , and improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3497801DOI Listing

Publication Analysis

Top Keywords

semi-supervised supervised
12
anomaly detection
8
novel algorithm
8
unlabeled data
8
original training
8
training data
8
datasets adbench
8
data
7
detection
6
nng-mix
5

Similar Publications

Anomaly detection is a common application of machine learning. Out-of-distribution (OOD) detection in particular is a semi-supervised anomaly detection technique where the detection method is trained only on the inlier (in-distribution) samples-unlike the fully supervised variant, the distribution of the outlier samples are never explicitly modeled in OOD detection tasks. In this work, we design a novel GAN-based OOD detection network specifically designed to protect a cyber-physical signal systems from novel Trojan malware called non-control data (NCD) attack that evades conventional malware detection techniques.

View Article and Find Full Text PDF

Low-cost sensors (LCSs) can address gaps in regulatory air quality monitoring station (AQMS) distribution, but they face data quality issues and spatial misalignment challenges when calibrating large-scale LCS networks against AQMS networks. This study proposed a semi-supervised learning model that uses data augmentation via chained imputation (CI-DA) to address the spatial misalignment problem by synthesizing pseudo-LCS data, thereby enhancing the use of LCS in PM mapping. Tangshan, an industrial city in northern China, was selected as the case study area.

View Article and Find Full Text PDF

With the development of information and communication technology, it has become possible to improve pharmacy management system (PMS) using these technologies. Our study aims to enhance the accuracy of drug attribute classification and recommend appropriate medications to improve patient compliance and treatment outcomes through the use of a semi-supervised learning method combined with artificial intelligence (AI) technology. This study proposed a semi-supervised learning method that integrates various technologies such as PMS, electronic prescriptions, and inventory management with AI to process and analyzed drug data, which enabled dynamic inventory updates and precise drug distribution.

View Article and Find Full Text PDF

In online teaching environments, the lack of direct emotional interaction between teachers and students poses challenges for teachers to consciously and effectively manage their emotional expressions. The design and implementation of an early warning system for teaching provide a novel approach to intelligent evaluation and improvement of online education. This study focuses on segmenting different emotional segments and recognizing emotions in instructional videos.

View Article and Find Full Text PDF

Evaluating the strength of industrial wastesbased concrete reinforced with steel fiber using advanced machine learning.

Sci Rep

March 2025

Departamento de Ciencias de la Construcción, Facultad de Ciencias de la Construcción Ordenamiento Territorial, Universidad Tecnológica Metropolitana, Santiago, Chile.

The traditional evaluation of compressive strength through repeated experimental works can be resource-intensive, time-consuming, and environmentally taxing. Leveraging advanced machine learning (ML) offers a faster, cheaper, and more sustainable alternative for evaluating and optimizing concrete properties, particularly for materials incorporating industrial wastes and steel fibers. In this research work, a total of 166 records were collected and partitioned into training set (130 records = 80%) and validation set (36 records = 20%) in line with the requirements of data partitioning and sorting for optimal model performance.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!