CAREx: context-aware read extension of paired-end sequencing data.

BMC Bioinformatics

Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.

Published: May 2024

Background: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads.

Results: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools.

Conclusion: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11088031PMC
http://dx.doi.org/10.1186/s12859-024-05802-wDOI Listing

Publication Analysis

Top Keywords

pseudo-long reads
12
reads paired-end
8
simulated data
8
read pairs
8
novo assembly
8
data
6
carex
5
read
5
carex context-aware
4
context-aware read
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!