Whole genome sequencing has revolutionized infectious disease surveillance for tracking and monitoring the spread and evolution of pathogens. However, using a linear reference genome for genomic analyses may introduce biases, especially when studies are conducted on highly variable bacterial genomes of the same species. Pangenome graphs provide an efficient model for representing and analyzing multiple genomes and their variants as a graph structure that includes all types of variations. In this study, we present a practical bioinformatics pipeline that employs the PanGenome Graph Builder and the Variation Graph toolkit to build pangenomes from assembled genomes, align whole genome sequencing data and call variants against a graph reference. The pangenome graph enables the identification of structural variants, rearrangements, and small variants (e.g., single nucleotide polymorphisms and insertions/deletions) simultaneously. We demonstrate that using a pangenome graph, instead of a single linear reference genome, improves mapping rates and variant calling for both simulated and real datasets of the pathogen . Overall, pangenome graphs offer a promising approach for comparative genomics and comprehensive genetic variation analysis in infectious disease. Moreover, this innovative pipeline, leveraging pangenome graphs, can bridge variant analysis, genome assembly, population genetics, and evolutionary biology, expanding the reach of genomic understanding and applications.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448961 | PMC |
http://dx.doi.org/10.3389/fgene.2023.1225248 | DOI Listing |
Cattle have been selectively bred for coat color, spotting, and depigmentation patterns. The assumed autosomal dominant inherited genetic variants underlying the characteristic white head of Fleckvieh, Simmental, and Hereford cattle have not been identified yet, although the contribution of structural variation upstream the gene has been proposed. Here, we construct a graph pangenome from 24 haplotype assemblies representing seven taurine cattle breeds to identify and characterize the white head-associated locus for the first time based on long-read sequencing data and pangenome analyses.
View Article and Find Full Text PDFMol Biol Evol
December 2024
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China.
Pangenomes can facilitate a deeper understanding of genome complexity. Using de novo phased long-read assemblies of eight representative goat breeds, we constructed a graph-based pangenome of goats (Capra hircus) and discovered 113-Mb autosomal novel sequences. Combining this multi-assembly pangenome with low-coverage PacBio HiFi sequences, we constructed a long-read structural variations (SVs) database containing 59,325 SV deletions, 84,910 SV insertions, and 24,954 other complex SV alleles.
View Article and Find Full Text PDFNAR Genom Bioinform
December 2024
Biomathematics and Statistics Scotland, The James Hutton Institute, Peter Guthrie Tait Road, EH9 3FD, Edinburgh, United Kingdom.
This paper presents a new data structure, GIN-TONIC (raph dexing hrough ptimal ear nterval ompaction), designed to index arbitrary string-labelled directed graphs representing, for instance, pangenomes or transcriptomes. GIN-TONIC provides several capabilities not offered by other graph-indexing methods based on the FM-Index. It is non-hierarchical, handling a graph as a monolithic object; it indexes at nucleotide resolution all possible walks in the graph without the need to explicitly store them; it supports exact substring queries in polynomial time and space for all possible walk roots in the graph, even if there are exponentially many walks corresponding to such roots.
View Article and Find Full Text PDFPLoS Comput Biol
December 2024
Department of Computer Science, University of Milano-Bicocca, Milan, Italy.
Pangenomes are becoming a powerful framework to perform many bioinformatics analyses taking into account the genetic variability of a population, thus reducing the bias introduced by a single reference genome. With the wider diffusion of pangenomes, integrating genetic variability with transcriptome diversity is becoming a natural extension that demands specific methods for its exploration. In this work, we extend the notion of spliced pangenomes to that of annotated spliced pangenomes; this allows us to introduce a formal definition of Alternative Splicing (AS) events on a graph structure.
View Article and Find Full Text PDFPlant Genome
December 2024
School of Biological Sciences, The University of Western Australia, Perth, Western Australia, Australia.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!