XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis.

Bioinform Adv

Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, 22363, Sweden.

Published: February 2025

Motivation: Missing data present a pervasive challenge in data analysis, potentially biasing outcomes and undermining conclusions if not addressed properly. Missing data are commonly classified into Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). While MCAR poses a minimal risk of data distortion, both MAR and MNAR can seriously affect the results of subsequent analyses. Therefore, it is important to know the type of missing data and appropriately handle them.

Results: To facilitate efficient handling of missing data, we introduce a Python package named XeroGraph that is designed to evaluate data quality, categorize the nature of missingness, and guide imputation decisions. By comparing how various imputation methods influence underlying distributions, XeroGraph provides a systematic framework that supports more accurate and transparent analyses. Through its comprehensive preliminary assessments and user-friendly interface, this package facilitates the selection of optimal strategies tailored to the specific missing data mechanisms present in a dataset. In doing so, XeroGraph may significantly improve the validity and reproducibility of research findings, making it a valuable tool for professionals in data-intensive fields.

Availability And Implementation: XeroGraph is compatible with all operating systems and requires Python version 3.9 or higher. It can be freely downloaded from PyPI (https://pypi.org/project/XeroGraph). The source code is accessible on GitHub (https://github.com/kazilab/XeroGraph), and comprehensive documentation is available at Read the Docs (https://xerograph.readthedocs.io). This software is distributed under the Apache License 2.0.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889451PMC
http://dx.doi.org/10.1093/bioadv/vbaf035DOI Listing

Publication Analysis

Top Keywords

missing data
20
data
9
missing
9
missing random
8
xerograph
5
xerograph enhancing
4
enhancing data
4
data integrity
4
integrity presence
4
presence missing
4

Similar Publications

Based on data from a randomized, controlled vaccine efficacy trial, this article develops statistical methods for assessing vaccine efficacy (VE) to prevent COVID-19 infections by a discrete set of genetic strains of SARS-CoV-2. Strain-specific VE adjusting for possibly time-varying covariates is estimated using augmented inverse probability weighting to address missing viral genotypes under a competing risks model that allows separate baseline hazards for different risk groups. Hypothesis tests are developed to assess whether the vaccine provides at least a specified level of VE against some viral genotypes and whether VE varies across genotypes.

View Article and Find Full Text PDF

Background: Several hematological and biochemical parameters have been related to the COVID-19 infection severity and outcomes. However, less is known about clinical indicators reflecting lung involvement of COVID-19 patients at hospital admission. Computed tomography (CT) represents an established imaging tool for the detection of lung injury, and the quantitative analysis software CALIPER has been used to assess lung involvement in COVID-19 patients.

View Article and Find Full Text PDF

In 2020, the coronavirus disease 2019 (COVID-19) pandemic altered lifestyles dramatically. We previously reported that the physical function of walk-in rehabilitation users in Japan worsened after the state of emergency declaration and continued to worsen until the end of 2020. However, whether physical function continued to worsen during the prolonged pandemic period remains unclear.

View Article and Find Full Text PDF

Background: Endoscopic retrograde cholangiopancreatography requires precise body movement control for procedural safety and efficiency. Sedatives are commonly used but pose risks, especially in elderly patients. This study evaluated the effectiveness of the Medo V-Fix device in controlling patient movement during endoscopic retrograde cholangiopancreatography.

View Article and Find Full Text PDF

Background: Hepatitis B virus (HBV) infection is a significant cause of morbidity and mortality globally. The World Health Organization estimates that just 10.5% of individuals living with HBV globally are aware of their status.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!