Motivation: Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive.
Results: Here, we first show that in multiple animal and plant datasets, 18-62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND.
Availabilityand Implementation: OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479680 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btab219 | DOI Listing |
Antonie Van Leeuwenhoek
January 2025
Department of Marine Science and Technology, Fukui Prefectural University, Obama, Fukui, 917-0003, Japan.
A novel aerobic marine bacterium, FRT2, isolated from surface water of a fishing port in Fukui, Japan, was characterised based on phylogenomic and phylogenetic analyses combined with classical phenotypic and chemotaxonomic characterisations. Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain FRT2 clustered with genus Leeuwenhoekiella. Closest relatives of FRT2 were Leeuwenhoekiella palythoae KMM 6264 and Leeuwenhoekiella nanhaiensis G18 with 16S rRNA gene sequence identities of 95.
View Article and Find Full Text PDFinfects the urogenital tract of men and women and causes the sexually transmitted infection trichomoniasis. Since the publication of its draft genome in 2007, the genome has drawn attention for several reasons, including its unusually large size, massive expansion of gene families, and high repeat content. The fragmented nature of the draft assembly made it challenging to obtain accurate metrics of features, such as spliceosomal introns.
View Article and Find Full Text PDFFront Plant Sci
January 2025
Bio-resource Research and Utilization Joint Key Laboratory of Sichuan and Chongqing, Chongqing Institute of Medicinal Plant Cultivation, Nanchuan, Chongqing, China.
Introduction: Mitochondria are essential organelles that provide energy for plants. They are semi-autonomous, maternally inherited, and closely linked to cytoplasmic male sterility (CMS) in plants. , a widely used medicinal plant from the Caprifoliaceae family, is rich in chlorogenic acid (CGA) and its analogues, which are known for their antiviral and anticancer properties.
View Article and Find Full Text PDFMicroorganisms
January 2025
Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, 70125 Bari, Italy.
Polyhydroxybutyrate (PHB) is a biodegradable natural polymer produced by different prokaryotes as a valuable carbon and energy storage compound. Its biosynthesis pathway requires the sole expression of the operon, although auxiliary genes play a role in controlling polymer accumulation, degradation, granule formation and stabilization. Due to its biodegradability, PHB is currently regarded as a promising alternative to synthetic plastics for industrial/biotechnological applications.
View Article and Find Full Text PDFGenes (Basel)
December 2024
Hainan Key Lab of Tropical Animal Reproduction, Breeding and Epidemic Disease Research, Animal Genetic Engineering Key Lab of Haikou, Hainan University, Haikou 570228, China.
This research aims to enhance the genomic database of by identifying virulence genes through the whole genome sequencing and comparative analysis of a goat-derived (KOHN1) strain, while clarifying the relationship between its genetic evolution and virulence, ultimately providing a theoretical foundation for clinical prevention and diagnosis. Third-generation Oxford Nanopore Technologies (ONT) sequencing and second-generation Illumina sequencing were used to sequence the strain and analyze the database annotations. Screening for 10 virulence genes was conducted using PCR.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!