Bi-level error correction for PacBio long reads.

IEEE/ACM Trans Comput Biol Bioinform

Published: December 2017

The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2017.2780832DOI Listing

Publication Analysis

Top Keywords

long reads
24
error correction
16
reads
11
pacbio long
8
reads length
8
error rates
8
data sets
8
bicolor achieves
8
number aligned
8
aligned reads
8

Similar Publications

Background: Mungbean () is one of the most socio-economically important leguminous food crops of Asia and a rich source of dietary protein and micronutrients. Understanding its genetic makeup is crucial for genetic improvement and cultivar development.

Methods: In this study, we combined single-tube long-fragment reads (stLFR) sequencing technology with high-throughput chromosome conformation capture (Hi-C) technique to obtain a chromosome-level assembly of cultivar 'KUML4'.

View Article and Find Full Text PDF

There is a need for rigorous and scientifically-based testing standards for existing and new enteric methane mitigation technologies, including antimethanogenic feed additives (AMFA). The current review provides guidelines for conducting and analyzing data from experiments with ruminants intended to test the antimethanogenic and production effects of feed additives. Recommendations include study design and statistical analysis of the data, dietary effects, associative effect of AMFA with other mitigation strategies, appropriate methods for measuring methane emissions, production and physiological responses to AMFA, and their effects on animal health and product quality.

View Article and Find Full Text PDF

Despite advancements in antiretroviral therapy (ART) that reduces the viral load to undetectable levels and improve CD4 T cell counts, viral eradication has not been achieved due to HIV-1 persistence in resting CD4 T-cells. We, therefore, characterized the gene, which is essential for HIV-1 replication and pathogenesis, from 20 virologically controlled aging individuals with HIV (HIV) on long-term ART and improved CD4 T-cell counts, with a particular focus on older individuals. Peripheral blood mononuclear cell genomic DNA from HIV were used to amplify gene by polymerase chain reaction followed by nucleotide sequencing and analysis.

View Article and Find Full Text PDF

Background: Impairments in theory of mind (ToM) are highly prevalent among individuals with schizophrenia, resulting in substantial functional deficits. However, research on impairments in individuals with schizotypy has yielded inconsistent findings, with some studies finding ToM deficits in overall schizotypy, other studies finding ToM deficits in only specific schizotypy dimensions, and yet other studies finding no ToM deficits at all. One potential key factor that may account for this discrepancy is the use of schizotypy measures that do not adequately measure specific schizotypy dimensions.

View Article and Find Full Text PDF

Inter-chromosomal transcription hubs shape the 3D genome architecture of African trypanosomes.

Nat Commun

December 2024

Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität München, 82152, Planegg-Martinsried, Germany.

The eukaryotic nucleus exhibits a highly organized 3D genome architecture, with RNA transcription and processing confined to specific nuclear structures. While intra-chromosomal interactions, such as promoter-enhancer dynamics, are well-studied, the role of inter-chromosomal interactions remains poorly understood. Investigating these interactions in mammalian cells is challenging due to large genome sizes and the need for deep sequencing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!