The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies.

PLoS Comput Biol

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.

Published: June 2020

The introduction of third-generation DNA sequencing technologies in recent years has allowed scientists to generate dramatically longer sequence reads, which when used in whole-genome sequencing projects have yielded better repeat resolution and far more contiguous genome assemblies. While the promise of better contiguity has held true, the relatively high error rate of long reads, averaging 8-15%, has made it challenging to generate a highly accurate final sequence. Current long-read sequencing technologies display a tendency toward systematic errors, in particular in homopolymer regions, which present additional challenges. A cost-effective strategy to generate highly contiguous assemblies with a very low overall error rate is to combine long reads with low-cost short-read data, which currently have an error rate below 0.5%. This hybrid strategy can be pursued either by incorporating the short-read data into the early phase of assembly, during the read correction step, or by using short reads to "polish" the consensus built from long reads. In this report, we present the assembly polishing tool POLCA (POLishing by Calling Alternatives) and compare its performance with two other popular polishing programs, Pilon and Racon. We show that on simulated data POLCA is more accurate than Pilon, and comparable in accuracy to Racon. On real data, all three programs show similar performance, but POLCA is consistently much faster than either of the other polishing programs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7347232PMC
http://dx.doi.org/10.1371/journal.pcbi.1007981DOI Listing

Publication Analysis

Top Keywords

error rate
12
long reads
12
polishing tool
8
tool polca
8
genome assemblies
8
sequencing technologies
8
generate highly
8
short-read data
8
polishing programs
8
reads
5

Similar Publications

Harnessing spatiotemporal transformation in magnetic domains for nonvolatile physical reservoir computing.

Sci Adv

January 2025

Institute of Materials Research and Engineering (IMRE), Agency for Science Technology and Research (A*STAR), 2 Fusionopolis Way, #08-03 Innovis, Singapore 138634, Republic of Singapore.

Combining physics with computational models is increasingly recognized for enhancing the performance and energy efficiency in neural networks. Physical reservoir computing uses material dynamics of physical substrates for temporal data processing. Despite the ease of training, building an efficient reservoir remains challenging.

View Article and Find Full Text PDF

The false evidence rate: An approach to frequentist error rate control conditioning on the observed value.

Proc Natl Acad Sci U S A

January 2025

Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, United Kingdom.

A value is conventionally interpreted either as a) the probability by chance of obtaining more extreme results than those observed or b) a tool for declaring significance at a prespecified level. Both approaches carry difficulties: b) does not allow users to make inferences based on the data in hand, and is not rigorously followed by researchers in practice, while (a) is not meaningful as an error rate. Although values retain an important role, these shortcomings are likely to have contributed significantly to the scientific reproducibility crisis.

View Article and Find Full Text PDF

Unlabelled: Piperacillin-tazobactam (TZP) is a commonly used broad-spectrum agent. OXA-1 β-lactamases drive global Enterobacterales TZP resistance and raise MICs to the clinical breakpoints (8/4-16/4 µg/mL), making susceptibility testing challenging. Two TZP disks are used globally.

View Article and Find Full Text PDF

Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset.

Healthcare (Basel)

December 2024

Department of Computer Science, School of Arts, Humanities and Social Sciences, University of Roehampton, London SW15 5PH, UK.

: Diabetes is a metabolic disorder characterized by increased blood sugar levels. Early detection of diabetes could help individuals to manage and delay the progression of this disorder effectively. Machine learning (ML) methods are important in forecasting the progression and diagnosis of different medical problems with better accuracy.

View Article and Find Full Text PDF

GFR Estimation and Correlation for Oncology Patients by Two Methods, Gates Method and Dual Time Point Plasma Sampling Method.

Indian J Nucl Med

November 2024

Department of Nuclear Medicine and Molecular Imaging, Homi Bhabha Cancer Hospital and Mahamana Pandit Madan Mohan Malaviya Cancer Centre, Tata Memorial Centre, Homi Bhabha National Institute (HBNI), Varanasi, India.

Background: With the increasing number of oncology cases and a parallel surge in chemotherapeutic drugs for treatment, the treating physicians conducts nephrotoxicity evaluation to provide a personalized dosing strategy. Of the various tests available, glomerular filtration rate (GFR) under gamma camera with help of Gates method has gained importance, being a good index of overall kidney functions. In addition to this, there has been an alternate and old method for GFR estimation: plasma sampling.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!