Accurately basecalling sequence backbones in the presence of nucleotide modifications remains a substantial challenge in nanopore sequencing bioinformatics. It has been extensively demonstrated that state-of-the-art basecallers are less compatible with modification-induced sequencing signals. A precise basecalling, on the other hand, serves as the prerequisite for virtually all the downstream analyses. Here, we report that basecallers exposed to diverse training modifications gain the generalizability to analyze novel modifications. With synthesized oligos as the model system, we precisely basecall various out-of-sample RNA modifications. From the representation learning perspective, we attribute this generalizability to basecaller representation space expanded by diverse training modifications. Taken together, we conclude increasing the training data diversity as a paradigm for building modification-tolerant nanopore sequencing basecallers.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735843PMC
http://dx.doi.org/10.1038/s41467-025-55974-zDOI Listing

Publication Analysis

Top Keywords

nanopore sequencing
12
training data
8
data diversity
8
diverse training
8
training modifications
8
modifications
5
training
4
diversity enhances
4
enhances basecalling
4
basecalling novel
4

Similar Publications

In July 2023, panicle and leaf blight-like symptoms were observed from the rice () variety, PVL03, in research field plots in Louisiana (Rayne, LA 70578, USA; 30.21330⁰ N, 92.37309⁰ W).

View Article and Find Full Text PDF

Aerolysin Nanopore Electrochemistry.

Acc Chem Res

January 2025

Molecular Sensing and Imaging Center, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, China.

ConspectusIons are the crucial signaling components for living organisms. In cells, their transportation across pore-forming membrane proteins is vital for regulating physiological functions, such as generating ionic current signals in response to target molecule recognition. This ion transport is affected by confined interactions and local environments within the protein pore.

View Article and Find Full Text PDF

Background: To better understand factors associated with virologic response, we retrospectively characterized the HIV proviruses of 7 people with HIV who received long-acting cabotegravir/rilpivirine (CAB/RPV-LA) and were selected according to the following criteria: virologic control achieved despite a history of viral replication on 1 or both corresponding antiretroviral classes (n = 6) and virologic failure (VF) after CAB/RPV-LA initiation (n = 1).

Methods: Last available blood samples before the initiation of CAB/RPV-LA were analyzed retrospectively. Near full-length HIV DNA genome haplotypes were inferred from Nanopore sequencing by the in vivo Genome Diversity Analyzer to search for archived drug resistance mutations (DRMs) and evaluate the frequency and intactness of proviruses harboring DRMs.

View Article and Find Full Text PDF

Genome assembly of the grassland caterpillar Gynaephora qinghaiensis.

Sci Data

January 2025

State Key Laboratory of Rice Biology, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.

The grassland caterpillars are the most damaging insect pests to the alpine meadow of the Qinghai-Tibetan Plateau in China. In this study, we present a genome assembly of one grassland caterpillar Gynaephora qinghaiensis by using Oxford Nanopore long-read and BGI short-read sequencing. The genome assembly of 861.

View Article and Find Full Text PDF

Nanopore sequencing to detect A-to-I editing sites.

Methods Enzymol

January 2025

School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore, Singapore. Electronic address:

Adenosine-to-inosine (A-to-I) RNA editing, mediated by the ADAR family of enzymes, is pervasive in metazoans and functions as an important mechanism to diversify the proteome and control gene expression. Over the years, there have been multiple efforts to comprehensively map the editing landscape in different organisms and in different disease states. As inosine (I) is recognized largely as guanosine (G) by cellular machineries including the reverse transcriptase, editing sites can be detected as A-to-G changes during sequencing of complementary DNA (cDNA).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!