Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Nat Methods

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Published: June 2022

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9745813PMC
http://dx.doi.org/10.1038/s41592-022-01445-yDOI Listing

Publication Analysis

Top Keywords

consensus accuracy
8
variant
6
merfin
4
merfin improved
4
improved variant
4
variant filtering
4
assembly
4
filtering assembly
4
assembly evaluation
4
evaluation polishing
4

Similar Publications

Objectives: To determine and compare the diagnostic accuracy of imaging tests for the prediction of RA progression in people with inflammatory joint pain or CSA.

Methods: We searched MEDLINE, Embase and Web of Science from 1987 to March 2024. Studies evaluating any imaging tests in participants with inflammatory joint pain or CSA, without clinical synovitis were eligible.

View Article and Find Full Text PDF

Purpose: To evaluate the measurement of main pancreatic duct (MPD) diameter on MRI for predicting MPD involvement in intraductal papillary mucinous neoplasms (IPMN).

Methods: This retrospective study included 595 patients with surgically confirmed IPMN who underwent preoperative MRI from 2015 to 2022. Three independent readers measured the maximum MPD diameter on two-dimensional axial and coronal T2-weighted imaging.

View Article and Find Full Text PDF

Transcatheter aortic valve replacement (TAVR) in patients with severe aortic stenosis and raphe-type bicuspid aortic valve (BAV) is still associated with poor outcomes in terms of increased risk of paravalvular regurgitation, stroke, and permanent pacemaker implantation. There is no definitive consensus on the optimal sizing method for prosthesis selection in this setting. The LIRA method is a supra-annular tailored sizing method specifically designed for bicuspid anatomy that might increase accuracy of prosthesis choice in BAV patients and improve TAVR outcomes.

View Article and Find Full Text PDF

Numerous efforts have been invested in previous algorithms to expose and enhance blood vessel (BV) visibility derived from clinical coronary angiography (CAG) procedures, such as noise reduction, segmentation, and background subtraction. Yet, the visibility of the BVs and their luminal content, particularly the small ones, is still limited. We propose a novel visibility enhancement algorithm, whose main body is inspired by a line completion mechanism of the visual system, i.

View Article and Find Full Text PDF

The co-gasification of biomass and plastic waste offers a promising solution for producing hydrogen-rich syngas, addressing the rising demand for cleaner energy. However, optimizing this complex process to maximize hydrogen yield remains challenging, particularly when balancing diverse feedstocks and improving process efficiency. While machine learning (ML) has shown significant potential in simulating and optimizing such processes, there is no clear consensus on the most effective regression models for co-gasification, especially with limited experimental data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!