CARE 2.0: reducing false-positive sequencing error corrections using machine learning.

Felix Kallenborn Julian Cascitti Bertil Schmidt

BMC Bioinformatics

Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.

Published: June 2022

Background: Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools.

Results: We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data.

Conclusion: False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9195321	PMC
http://dx.doi.org/10.1186/s12859-022-04754-3	DOI Listing

Publication Analysis

Top Keywords

error correction

false-positive corrections

de-novo assembly

care

machine learning

correction

state-of-the-art error

read error

corrections compared

compared state-of-the-art

Similar Publications

Correction: Aslam et al. pH Sensitive Pluronic Acid/Agarose-Hydrogels as Controlled Drug Delivery Carriers: Design, Characterization and Toxicity Evaluation. 2022, , 1218.

Pharmaceutics

December 2024

Faculty of Pharmacy, Capital University of Science and Technology (CUST), Islamabad 44000, Pakistan.

Mariam Aslam Kashif Barkat Nadia Shamshad Malik Mohammed S Alqahtani Irfan Anjum

[...

View Article and Find Full Text PDF

Similar Publications

Correction: Zandberg et al. The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics. 2022, , 2740.

Plants (Basel)

January 2025

School of Biological Sciences, University of Western Australia, Perth, WA 6009, Australia.

Jaco D Zandberg Cassandria T Fernandez Monica F Danilevicz William J W Thomas David Edwards

There was an error in the original publication [...

View Article and Find Full Text PDF

Similar Publications

A Precise Oxide Film Thickness Measurement Method Based on Swept Frequency and Transmission Cable Impedance Correction.

Sensors (Basel)

January 2025

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China.

Yifan Li Qi Xiao Lisha Peng Songling Huang Chaofeng Ye

Accurately measuring the thickness of the oxide film that accumulates on nuclear fuel assemblies is critical for maintaining nuclear power plant safety. Oxide film thickness typically ranges from a few micrometers to several tens of micrometers, necessitating a high-precision measurement system. Eddy current testing (ECT) is commonly employed during poolside inspections due to its simplicity and ease of on-site implementation.

View Article and Find Full Text PDF

Similar Publications

The Application of Supervised Machine Learning Algorithms for Image Alignment in Multi-Channel Imaging Systems.

Sensors (Basel)

January 2025

Department of Computer-Integrated Technologies of Device Production, Faculty of Instrumentation Engineering, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Beresteiskyi Ave., 37, 03056 Kyiv, Ukraine.

Kyrylo Romanenko Yevgen Oberemok Ivan Syniavskyi Natalia Bezugla Pawel Komada

This study presents a method for aligning the geometric parameters of images in multi-channel imaging systems based on the application of pre-processing methods, machine learning algorithms, and a calibration setup using an array of orderly markers at the nodes of an imaginary grid. According to the proposed method, one channel of the system is used as a reference. The images from the calibration setup in each channel determine the coordinates of the markers, and the displacements of the marker centers in the system's channels relative to the coordinates of the centers in the reference channel are then determined.

View Article and Find Full Text PDF

Similar Publications

Enhancing Off-Road Topography Estimation by Fusing LIDAR and Stereo Camera Data with Interpolated Ground Plane.

Sensors (Basel)

January 2025

Engineering Design, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.

Gustav Sten Lei Feng Björn Möller

Topography estimation is essential for autonomous off-road navigation. Common methods rely on point cloud data from, e.g.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!