NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.

Front Genet

State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, China.

Published: August 2020

Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7434944PMC
http://dx.doi.org/10.3389/fgene.2020.00900DOI Listing

Publication Analysis

Top Keywords

nanopore sequencing
20
error rate
16
electrical signals
12
nanoreviser
8
sequencing
8
based deep
8
deep learning
8
learning algorithm
8
sequencing reads
8
basecalling tools
8

Similar Publications

Carriage of antimicrobial resistance genes in Escherichia coli of bovine origin.

Pol J Vet Sci

December 2024

Department of Animal Nutrition and Husbandry, University of Veterinary Medicine and Pharmacy in Košice, Komenského 73, Košice, 04181, Slovakia.

The present study aimed to search for the presence of the plasmid-mediated antimicrobial resistance genes in 106 Escherichia coli (E. coli) isolates from a total of 240 fresh fecal samples collected from 12 private cattle farms in Bingol province of East Turkey from November 2021 to January 2022. In those colistin-resistant E.

View Article and Find Full Text PDF

Inert splint-driven oligonucleotide assembly.

Synth Biol (Oxf)

December 2024

Claret Bioscience LLC, 100 Enterprise Way, Suite A102, Scotts Valley, CA 95066, United States.

In this study, we introduce a new method for oligonucleotide fragment assembly. Unlike polymerase chain assembly and ligase chain assembly that rely on short, highly purified oligonucleotides, our method, named , uses a one-tube, splint-driven assembly reaction. Splynthesis connects standard-desalted "contig" oligos (∼150 nt in length) via shorter "splint" oligos harboring 5' and 3' blocking modifications to prevent off-target ligation and amplification events.

View Article and Find Full Text PDF

DNA methylation is an essential epigenetic mechanism for regulation of gene expression, through which many physiological (X-chromosome inactivation, genetic imprinting, chromatin structure and miRNA regulation, genome defense, silencing of transposable elements) and pathological processes (cancer and repetitive sequences-associated diseases) are regulated. Nanopore sequencing has emerged as a novel technique that can analyze long strands of DNA (long-read sequencing) without chemically treating the DNA. Interestingly, nanopore sequencing can also extract epigenetic status of the nucleotides (including both 5-Methylcytosine and 5-hydroxyMethylcytosine), and a large variety of bioinformatic tools have been developed for improving its detection properties.

View Article and Find Full Text PDF

As molecular research on hemp (Cannabis sativa L.) continues to advance, there is a growing need for the accumulation of more diverse genome data and more accurate genome assemblies. In this study, we report the three-way assembly data of a cannabidiol (CBD)-rich cannabis variety, 'Pink Pepper' cultivar using sequencing technology: PacBio Single Molecule Real-Time (SMRT) technology, Illumina sequencing technology, and Oxford Nanopore Technology (ONT).

View Article and Find Full Text PDF

Dissemination mechanisms of unique antibiotic resistance genes from flowback water to soil revealed by combined Illumina and Nanopore sequencing.

Water Res

December 2024

Key Laboratory of Three Gorges Reservoir Region's Eco-environment, Ministry of Education, Chongqing University, Chongqing 400045, PR China; State Key Laboratory of Coal Mine Disaster Dynamics and Control, Chongqing University, Chongqing 400044, PR China. Electronic address:

As a byproduct of shale gas extraction, flowback water (FW) is produced in large quantities globally. Due to the unique interactions between pollutants and microorganisms, FW always harbor multiple antibiotic resistance genes (ARGs) that have been confirmed in our previous findings, potentially serving as a point source for ARGs released into the environment. However, whether ARGs in FW can disseminate or integrate into the environmental resistome remains unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!