Top-down clustering for protein subfamily identification.

Evol Bioinform Online

Department of Computer Science, KU Leuven, Belgium.

Published: May 2013

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3653887PMC
http://dx.doi.org/10.4137/EBO.S11609DOI Listing

Publication Analysis

Top Keywords

subfamily identification
12
protein subfamily
8
novel method
8
hierarchical tree
8
better tree
8
tree topology
8
method
5
tree
5
top-down clustering
4
protein
4

Similar Publications

Carrageenans are sulfated polysaccharides found in the cell wall of certain red seaweeds. They are widely used in the food industry for their gelling and stabilizing properties. In nature, carrageenans undergo enzymatic modification and degradation by marine organisms.

View Article and Find Full Text PDF

Genome-Wide Identification and Functional Characterization of Gene Family Reveal Its Involvement in Response to Stress in Cotton.

Int J Mol Sci

January 2025

Institute of Cotton, Hebei Academy of Agriculture and Forestry Sciences/Key Laboratory of Cotton Biology and Genetic Breeding in Huanghuaihai Semiarid Area, Ministry of Agriculture and Rural Affairs, Shijiazhuang 050000, China.

SKP1 constitutes the Skp1-Cullin-F-box ubiquitin E3 ligase (SCF), which plays a role in plant growth and development and biotic and abiotic stress in ubiquitination. However, the response of the gene family to abiotic and biotic stresses in cotton has not been well characterized. In this study, a total of 72 genes with the conserved domain of SKP1 were identified in four Gossypium species.

View Article and Find Full Text PDF

The Identification and Characterization of the Gene Family in Oliv. Heteromorphic Leaves Provide a Theoretical Basis for the Functional Study of .

Int J Mol Sci

December 2024

Xinjiang Production and Construction Corps Key Laboratory of Protection and Utilization of Biological Resources in Tarim Basin, College of Life Science, Tarim University, Alar 843300, China.

Oliv. typically has four kinds of heteromorphic leaves: linear (Li), lanceolate (La), ovate (Ov) and broad ovate (Bo). Heteromorphic leaves help adapt to extreme desert environments and further contribute to protection against land desertification in Northwest China.

View Article and Find Full Text PDF

In flowering plants, MADS-box genes play regulatory roles in flower induction, floral initiation, and floral morphogenesis. (. ) is a traditional Chinese medicinal plant.

View Article and Find Full Text PDF

Amino acids in wine grapes function as precursors for various secondary metabolites and play a vital role in plant growth, development, and stress resistance. The amino acid/auxin permease () genes encode a large family of transporters; however, the identification and function of the gene family in grapes remain limited. Consequently, we conducted a comprehensive bioinformatics analysis of all genes in grapes, encompassing genome sequence analysis, conserved protein domain identification, chromosomal localization, phylogenetic relationship analysis, and gene expression profiling.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!