Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications.

Front Artif Intell

Lawrence Livermore National Laboratory, Livermore, CA, United States.

Published: May 2021

Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223254PMC
http://dx.doi.org/10.3389/frai.2021.589632DOI Listing

Publication Analysis

Top Keywords

dataset shift
12
shift detection
12
graph structured
8
structured data
8
directed graphs
8
data
5
preventing failures
4
dataset
4
failures dataset
4
shift
4

Similar Publications

Quantitatively assessing the origins of aerosol zinc (Zn) is crucial for understanding of the global atmospheric Zn cycle and for formulating targeted policies to mitigate anthropogenic Zn emissions. Zn isotope ratios (denoted as δ⁶⁶Zn) serve as powerful tools for constraining the origins of aerosol Zn. This review comprehensively compiles an δ⁶⁶Zn (relative to Lyon JMC Zn standard) dataset (n = 207) for multi-sized aerosols observed exclusively in the Northern Hemisphere, encompassing diverse atmospheric environments, including urban areas and remote deserts, glacier, and ocean.

View Article and Find Full Text PDF

This data paper presents a comprehensive reconstruction of U.S. fertility trends, offering state-level Total Fertility Rates (TFR) from 1931 to the present on an annual basis.

View Article and Find Full Text PDF

Introduction: Traditional methods for constructing synthetic nanobody libraries are labor-intensive and time-consuming. This study introduces a novel approach leveraging protein large language models (LLMs) to generate germline-specific nanobody sequences, enabling efficient library construction through statistical analysis.

Methods: We developed NanoAbLLaMA, a protein LLM based on LLaMA2, fine-tuned using low-rank adaptation (LoRA) on 120,000 curated nanobody sequences.

View Article and Find Full Text PDF

STAT3-orchestrated gene expression signatures and tumor microenvironment in esophageal squamous cell carcinoma uncovered by single-cell sequencing.

Biochim Biophys Acta Gen Subj

March 2025

Department of Oncology, Wujin Hospital Affiliated with Jiangsu University, Changzhou 213000, Jiangsu, China; Department of Oncology, The Wujin Clinical college of Xuzhou Medical University, Changzhou 213000, Jiangsu, China.

Background: The progression of Esophageal Squamous Cell Carcinoma (ESCC) can be dissected with greater precision using multi-omics and single-cell RNA sequencing (scRNA-seq) compared to traditional methodologies. These advanced approaches enable a comprehensive understanding of cellular heterogeneity and molecular dynamics, offering higher resolution insights into cancer development. Moreover, analyzing transcription factor regulatory networks provides innovative avenues for identifying cancer biomarkers and therapeutic targets, driving new perspectives in cancer research.

View Article and Find Full Text PDF

Background Multimodal generative artificial intelligence (AI) technologies can produce preliminary radiology reports, and validation with reader studies is crucial for understanding the clinical value of these technologies. Purpose To assess the clinical value of the use of a domain-specific multimodal generative AI tool for chest radiograph interpretation by means of a reader study. Materials and Methods A retrospective, sequential, multireader, multicase reader study was conducted using 758 chest radiographs from a publicly available dataset from 2009 to 2017.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!