Panacus: fast and exact pangenome growth and core size estimation.

bioRxiv

Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany.

Published: June 2024

Motivation: Using a single linear reference genome poses a limitation to exploring the full genomic diversity of a species. The release of a draft human pangenome underscores the increasing relevance of pangenomics to overcome these limitations. Pangenomes are commonly represented as graphs, which can represent billions of base pairs of sequence. Presently, there is a lack of scalable software able to perform key tasks on pangenomes, such as quantifying universally shared sequence across genomes (the ) and measuring the extent of genomic variability as a function of sample size ().

Results: We introduce Panacus (pangenome-abacus), a tool designed to rapidly perform these tasks and visualize the results in interactive plots. Panacus can process GFA files, the accepted standard for pangenome graphs, and is able to analyze a human pangenome graph with 110 million nodes in less than one hour.

Availability: Panacus is implemented in Rust and is published as Open Source software under the MIT license. The source code and documentation are available at https://github.com/marschall-lab/panacus. Panacus can be installed via Bioconda at https://bioconda.github.io/recipes/panacus/README.html.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11195249PMC
http://dx.doi.org/10.1101/2024.06.11.598418DOI Listing

Publication Analysis

Top Keywords

human pangenome
8
panacus
5
panacus fast
4
fast exact
4
pangenome
4
exact pangenome
4
pangenome growth
4
growth core
4
core size
4
size estimation
4

Similar Publications

is a bacterium associated with colorectal cancer (CRC) tumorigenesis, progression, and metastasis. Fap2 is a fusobacteria-specific outer membrane galactose-binding lectin that mediates adherence to and invasion of CRC tumors. Advances in omics analyses provide an opportunity to profile and identify microbial genomic features that correlate with the cancer-associated bacterial virulence factor Fap2.

View Article and Find Full Text PDF

With the increasing availability of high-quality genome assemblies, pangenome graphs emerged as a new paradigm in the genomics field for identifying, encoding, and presenting genomic variation at both population and species levels. However, it remains challenging to truly dissect and interpret pangenome graphs via biologically informative visualization. To facilitate better exploration and understanding of pangenome graphs towards novel biological insights, here we present a web-based interactive Visualization and interpretation framework for linear-Reference-projected Pangenome Graphs (VRPG).

View Article and Find Full Text PDF

Aligning genomes into common coordinates is central to pangenome analysis and construction, but it is also computationally expensive. Multi-sequence maximal unique matches (multi-MUMs) are guideposts for core genome alignments, helping to frame and solve the multiple alignment problem. We introduce Mumemto, a tool that computes multi-MUMs and other match types across large pangenomes.

View Article and Find Full Text PDF

GDBr: genomic signature interpretation tool for DNA double-strand break repair mechanisms.

Nucleic Acids Res

January 2025

Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea.

Large genetic variants can be generated via homologous recombination (HR), such as polymerase theta-mediated end joining (TMEJ) or single-strand annealing (SSA). Given that these HR-based mechanisms leave specific genomic signatures, we developed GDBr, a genomic signature interpretation tool for DNA double-strand break repair mechanisms using high-quality genome assemblies. We applied GDBr to a draft human pangenome reference.

View Article and Find Full Text PDF

non-typhoidal is a major contributor to diarrheal diseases, with over 2600 serovars identified across diverse environments. In Mexico, serovars Newport and Anatum have shown a marked increase, especially in foodborne disease, posing a public health problem. We conducted a cross-sectional study from 2021 to 2023 using active epidemiological surveillance to assess contamination in ground beef and pork at butcher shops nationwide.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!