Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) can significantly distort transcriptome analyses, yet their detection and removal remain challenging due to limitations in existing basecalling models. We present DeepChopper, a genomic language model that precisely identifies and removes adapter sequences from base-called dRNA-seq long reads at single-base resolution, operating independently of raw signal or alignment information to effectively eliminate chimeric read artifacts. By removing these artifacts, DeepChopper substantially improves the accuracy of critical downstream analyses, such as transcript annotation and gene fusion detection, thereby enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526916PMC
http://dx.doi.org/10.1101/2024.10.23.619929DOI Listing

Publication Analysis

Top Keywords

genomic language
8
language model
8
nanopore direct
8
direct rna
8
rna sequencing
8
model chimera
4
chimera artifact
4
artifact detection
4
detection nanopore
4
sequencing chimera
4

Similar Publications

Dementia Care Research and Psychosocial Factors.

Alzheimers Dement

December 2024

Institute for Advanced Medical Research and Training, College of Medicine, University of Ibadan, Ibadan, Oyo State, Nigeria.

Background: Majority of people living worldwide live in low- and middle- income countries, including sub-Saharan Africa (SSA). Most cognitive assessment batteries for Alzheimer's Disease(AD), are developed in high income countries (HICs), where most international dementia collaborations and data originate. The African Dementia Consortium (AfDC) is a new scientific collaboration network currently participating in the Recruitment and Retention for Alzheimer's Disease Diversity Genetic Cohorts in the Alzheimer's Disease Sequencing Project (READD-ADSP).

View Article and Find Full Text PDF

Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species.

View Article and Find Full Text PDF

Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins.

View Article and Find Full Text PDF

[Clinical and genetic analysis of a child with Lamb-Shaffer syndrome due to a de novo variant of SOX5 gene].

Zhonghua Yi Xue Yi Chuan Xue Za Zhi

January 2025

Department of Clinical Laboratory, Children's Hospital Affiliated to Zhengzhou University, Zhengzhou Key Laboratory of Children's Infection and Immunity, Zhengzhou, Henan 450018, China.

Objective: To explore the clinical features of a child with Lamb-Shaffer syndrome (LAMSHF) due to a variant of SOX5 gene.

Methods: A child who was admitted to Children's Hospital Affiliated to Zhengzhou University in July 2022 was selected as the study subject. Clinical data of the child was collected.

View Article and Find Full Text PDF

Objective: To investigate the clinical features and genetic variants associated with Multiple mitochondrial dysfunction syndrome (MMDS) type 3 in two children.

Methods: Two children diagnosed with MMDS type 3 at Zhuhai Maternal and Child Health Care Hospital in January 2021 were selected for this study. A retrospective analysis of their clinical data was carried out.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!