CoreDetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes.

Bioinformatics

The Biometry Hub, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia 5064, Australia.

Published: November 2023

Motivation: Whole genome alignment of eukaryote species remains an important method for the determination of sequence and structural variations and can also be used to ascertain the representative non-redundant core-genome sequence of a population. Many whole genome alignment tools were first developed for the more mature analysis of prokaryote species with few current tools containing the functionality to process larger genomes of eukaryotes as well as genomes of more divergent species. In addition, the functionality of these tools becomes computationally prohibitive due to the significant compute resources needed to handle larger genomes.

Results: In this research, we present CoreDetector, an easy-to-use general-purpose program that can align the core-genome sequences for a range of genome sizes and divergence levels. To illustrate the flexibility of CoreDetector, we conducted alignments of a large set of closely related fungal pathogen and hexaploid wheat cultivar genomes as well as more divergent fly and rodent species genomes. In all cases, compared to existing multiple genome alignment tools, CoreDetector exhibited improved flexibility, efficiency, and competitive accuracy in tested cases.

Availability And Implementation: CoreDetector was developed in the cross platform, and easily deployable, Java language. A packaged pipeline is readily executable in a bash terminal without any external need for Perl or Python environments. Installation, example data, and usage instructions for CoreDetector are freely available from https://github.com/mfruzan/CoreDetector.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10663985PMC
http://dx.doi.org/10.1093/bioinformatics/btad628DOI Listing

Publication Analysis

Top Keywords

genome alignment
12
alignment tools
8
coredetector
6
genomes
5
coredetector flexible
4
flexible efficient
4
efficient program
4
program core-genome
4
alignment
4
core-genome alignment
4

Similar Publications

Background: Colon cancer is a leading cause of mortality in Appalachian Kentucky. Studies suggest that the microbiome may influence cancer outcomes. We investigate differential gene expression, the tumor microbiome, and the association between the two as potential drivers of disparities in colon cancer outcomes.

View Article and Find Full Text PDF

Deciphering the biosynthetic pathway of triterpene saponins in Prunella vulgaris.

Plant J

January 2025

College of Horticulture, Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, 210095, China.

The traditional Chinese medicinal plant Prunella vulgaris contains numerous triterpene saponin metabolites, notably ursolic and oleanolic acid saponins, which have significant pharmacological values. Despite their importance, the genes responsible for synthesizing these triterpene saponins in P. vulgaris remain unidentified.

View Article and Find Full Text PDF

methylGrapher: genome-graph-based processing of DNA methylation data from whole genome bisulfite sequencing.

Nucleic Acids Res

January 2025

Department of Genetics, The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.

Genome graphs, including the recently released draft human pangenome graph, can represent the breadth of genetic diversity and thus transcend the limits of traditional linear reference genomes. However, there are no genome-graph-compatible tools for analyzing whole genome bisulfite sequencing (WGBS) data. To close this gap, we introduce methylGrapher, a tool tailored for accurate DNA methylation analysis by mapping WGBS data to a genome graph.

View Article and Find Full Text PDF

infects the urogenital tract of men and women and causes the sexually transmitted infection trichomoniasis. Since the publication of its draft genome in 2007, the genome has drawn attention for several reasons, including its unusually large size, massive expansion of gene families, and high repeat content. The fragmented nature of the draft assembly made it challenging to obtain accurate metrics of features, such as spliceosomal introns.

View Article and Find Full Text PDF

Disentangling protein metabolic costs in human cells and tissues.

PNAS Nexus

January 2025

Logic of Genomic Systems Laboratory (CNB-CSIC), Madrid E-28049, Spain.

While more data are becoming available on gene activity at different levels of biological organization, our understanding of the underlying biology remains incomplete. Here, we introduce a metabolic efficiency framework that considers highly expressed proteins (HEPs), their length, and biosynthetic costs in terms of the amino acids (AAs) they contain to address the observed balance of expression costs in cells, tissues, and cancer transformation. Notably, the combined set of HEPs in either cells or tissues shows an abundance of large and costly proteins, yet tissues compensate this with short HEPs comprised of economical AAs, indicating a stronger tendency toward mitigating costs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!