Background: Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools.
Results: We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data.
Conclusion: False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9195321 | PMC |
http://dx.doi.org/10.1186/s12859-022-04754-3 | DOI Listing |
Pharmaceutics
December 2024
Faculty of Pharmacy, Capital University of Science and Technology (CUST), Islamabad 44000, Pakistan.
[...
View Article and Find Full Text PDFPlants (Basel)
January 2025
School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia.
There was an error in the original publication [...
View Article and Find Full Text PDFSensors (Basel)
January 2025
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China.
Accurately measuring the thickness of the oxide film that accumulates on nuclear fuel assemblies is critical for maintaining nuclear power plant safety. Oxide film thickness typically ranges from a few micrometers to several tens of micrometers, necessitating a high-precision measurement system. Eddy current testing (ECT) is commonly employed during poolside inspections due to its simplicity and ease of on-site implementation.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Department of Computer-Integrated Technologies of Device Production, Faculty of Instrumentation Engineering, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Beresteiskyi Ave., 37, 03056 Kyiv, Ukraine.
This study presents a method for aligning the geometric parameters of images in multi-channel imaging systems based on the application of pre-processing methods, machine learning algorithms, and a calibration setup using an array of orderly markers at the nodes of an imaginary grid. According to the proposed method, one channel of the system is used as a reference. The images from the calibration setup in each channel determine the coordinates of the markers, and the displacements of the marker centers in the system's channels relative to the coordinates of the centers in the reference channel are then determined.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Engineering Design, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.
Topography estimation is essential for autonomous off-road navigation. Common methods rely on point cloud data from, e.g.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!