Motivation: Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.

Results: We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.

Availability And Implementation: Our open access tools and instructions how to reproduce our experiments are available at the following address: https://github.com/algbio/panvc-founders.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665761PMC
http://dx.doi.org/10.1093/bioinformatics/btab516DOI Listing

Publication Analysis

Top Keywords

variant calling
12
founder reconstruction
8
multiple alignment
8
founder
4
reconstruction enables
4
enables scalable
4
scalable seamless
4
seamless pangenomic
4
pangenomic analysis
4
analysis motivation
4

Similar Publications

Protocol for mitochondrial variant enrichment from single-cell RNA sequencing using MAESTER.

STAR Protoc

January 2025

Division of Hematology, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Ludwig Center at Harvard, Harvard Medical School, Boston, MA, USA. Electronic address:

Single-cell RNA sequencing (scRNA-seq) enables detailed characterization of cell states but often lacks insights into tissue clonal structures. Here, we present a protocol to probe cell states and clonal information simultaneously by enriching mitochondrial DNA (mtDNA) variants from 3'-barcoded full-length cDNA. We describe steps for input library preparation, mtDNA enrichment, PCR product cleanup, and paired-end sequencing.

View Article and Find Full Text PDF

Background: Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling.

View Article and Find Full Text PDF

Genomic and phenotypic correlates of mosaic loss of chromosome Y in blood.

Am J Hum Genet

January 2025

Division of Biostatistics, Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA; Cancer Center, Medical College of Wisconsin, Milwaukee, WI, USA. Electronic address:

Mosaic loss of Y (mLOY) is the most common somatic chromosomal alteration detected in human blood. The presence of mLOY is associated with altered blood cell counts and increased risk of Alzheimer disease, solid tumors, and other age-related diseases. We sought to gain a better understanding of genetic drivers and associated phenotypes of mLOY through analyses of whole-genome sequencing (WGS) of a large set of genetically diverse males from the Trans-Omics for Precision Medicine (TOPMed) program.

View Article and Find Full Text PDF

Primary ciliary dyskinesia (PCD, OMIM 244400) is a rare genetic disorder that affects motile cilia and is characterised by impaired mucociliary clearance of the airway epithelium, which results in chronic upper and lower airway infections. While short-read next-generation sequencing technology has been used for the genetic testing of PCD, its effectiveness is limited in identifying variants in the gene because of the nearly identical pseudogene As we confirmed that the gene was not expressed in airway cells, we obtained nasal mucosa biopsy specimens for total RNA sequencing (RNA-seq) with library enrichment using exome oligos. Among the 34 nasal samples from patients suspected of having PCD, three aberrant splicing patterns in were identified in two samples.

View Article and Find Full Text PDF

Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data.

bioRxiv

January 2025

Department of Computer Science, School of Computing and Data Science, University of Hong Kong, Hong Kong, China.

Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!