Summary: Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias.
Availability And Implementation: PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603592 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btad621 | DOI Listing |
J Vet Sci
December 2024
College of Veterinary Medicine, Chungbuk National University, Cheongju 28644, Korea.
Importance: This study is essential for comprehending the zoonotic transmission, antimicrobial resistance, and genetic diversity of enteropathogenic (EPEC).
Objective: To improve our understanding of EPEC, this study focused on analyzing and comparing the genomic characteristics of EPEC isolates from humans and companion animals in Korea.
Methods: The whole genome of 26 EPEC isolates from patients with diarrhea and 20 EPEC isolates from companion animals in Korea were sequenced using the Illumina HiSeq X (Illumina, USA) and Oxford Nanopore MinION (Oxford Nanopore Technologies, UK) platforms.
Sci Rep
January 2025
Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh.
Enterobacter asburiae (E. asburiae) is a gram-negative rod-shaped bacterium which has emerging significance as an opportunistic pathogen having high virulence pattern and drug resistant properties. In this study, we present the detailed analysis of the whole genome sequence of a multidrug-resistant (MDR) E.
View Article and Find Full Text PDFmSphere
December 2024
Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.
Unlabelled: Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively.
View Article and Find Full Text PDFmBio
December 2024
Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
Unlabelled: ) is a clinically significant pathogen and a highly genetically diverse species due to its large accessory genome. The functional consequence of this diversity remains unknown mainly because, to date, functional genomic studies in have been primarily performed on reference strains. Given the growing public health threat of infections, understanding the functional genomic differences among clinical isolates can provide more insight into how its genetic diversity influences gene essentiality, clinically relevant phenotypes, and importantly, potential drug targets.
View Article and Find Full Text PDFImeta
December 2024
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology Northwest A&F University Yangling Shaanxi China.
The development of a comprehensive pig graph pangenome assembly encompassing 27 genomes represents the most extensive collection of pig genomic data to date. Analysis of this pangenome reveals the critical role of structural variations in driving adaptation and defining breed-specific traits. Notably, the study identifies as a key candidate gene governing intramuscular fat deposition and meat quality in pigs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!