Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357140PMC
http://dx.doi.org/10.1371/journal.pcbi.1008984DOI Listing

Publication Analysis

Top Keywords

gene errors
12
gene names
8
supplementary files
8
gene
6
errors lessons
4
lessons learned
4
learned erroneous
4
erroneous conversion
4
conversion gene
4
names dates
4

Similar Publications

Dementia Care Research and Psychosocial Factors.

Alzheimers Dement

December 2024

NYU Langone Health, New York, NY, USA.

Background: Large language models (LLMs) provide powerful natural language processing capabilities in medical and clinical tasks. Evaluating LLM performance is crucial due to potential false results. In this study, we assessed ChatGPT and Llama2, two state-of-the-art LLMs, in extracting information from clinical notes, focusing on cognitive tests, specifically the Mini Mental State Exam (MMSE) and Cognitive Dementia Rating (CDR).

View Article and Find Full Text PDF

Primary progressive multiple sclerosis (PPMS) affects 10-15% of multiple sclerosis patients and presents significant variability in the rate of disability progression. Identifying key biological features and patients at higher risk for fast progression is crucial to develop and optimize treatment strategies. Peripheral blood cell transcriptome has the potential to provide valuable information to predict patients' outcomes.

View Article and Find Full Text PDF

Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness.

Nat Genet

January 2025

Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China.

Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in genomic studies. Here, we report a 2.85-Gb gap-free telomere-to-telomere genome of a ram (T2T-sheep1.

View Article and Find Full Text PDF

Choline is an essential micronutrient critical for cellular and organismal homeostasis. As a core component of phospholipids and sphingolipids, it is indispensable for membrane architecture and function. Additionally, choline is a precursor for acetylcholine, a key neurotransmitter, and betaine, a methyl donor important for epigenetic regulation.

View Article and Find Full Text PDF

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!