Background: Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical models are used to scan all protein sequences, to recreate a library of segments and to cluster them again.

Results: Processing the Swiss-Prot section of the UniProt Knoledgebase, release 7.2, EVEREST defines 20,230 domains, covering 85% of the amino acids of the Swiss-Prot database. EVEREST annotates 11,852 proteins (6% of the database) that are not annotated by Pfam A. In addition, in 43,086 proteins (20% of the database), EVEREST annotates a part of the protein that is not annotated by Pfam A. Performance tests show that EVEREST recovers 56% of Pfam A families and 63% of SCOP families with high accuracy, and suggests previously unknown domain families with at least 51% fidelity. EVEREST domains are often a combination of domains as defined by Pfam or SCOP and are frequently sub-domains of such domains.

Conclusion: The EVEREST process and its output domain families provide an exhaustive and validated view of the protein domain world that is automatically generated from sequence data. The EVEREST library of domain families, accessible for browsing and download at 1, provides a complementary view to that provided by other existing libraries. Furthermore, since it is automatic, the EVEREST process is scalable and we will run it in the future on larger databases as well. The EVEREST source files are available for download from the EVEREST web site.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533870PMC
http://dx.doi.org/10.1186/1471-2105-7-277DOI Listing

Publication Analysis

Top Keywords

domain families
20
everest
12
families
10
protein domains
8
protein sequences
8
automatically generated
8
generated sequence
8
sequence data
8
database everest
8
everest annotates
8

Similar Publications

Increasing evidence suggests that inhibition of receptor-interacting serine/threonine-protein kinase (RIPK) 1/RIPK3/mixed lineage kinase domain-like pseudokinase (MLKL) necrosome has protective effects in vivo models of painful conditions seen in humans associated with inflammation and demyelination in the central nervous system. However, the contribution of RIPK1-driven necroptosis to inflammatory pain remains unknown. Therefore, this study aims to determine the effect of necrostatin (Nec) -1s, a selective RIPK1 inhibitor, on lipopolysaccharide (LPS)-induced inflammatory pain and related underlying mechanisms.

View Article and Find Full Text PDF

CMPK2 promotes NLRP3 inflammasome activation via mtDNA-STING pathway in house dust mite-induced allergic rhinitis.

Clin Transl Med

January 2025

Allergy Center, Department of Otolaryngology, Affiliated Eye and ENT Hospital, Fudan University, Shanghai, China.

Background: House dust mite (HDM) is the leading allergen for allergic rhinitis (AR). Although allergic sensitisation by inhaled allergens renders susceptible individuals prone to developing AR, the molecular mechanisms driving this process remain incompletely elucidated.

Objective: This study aimed to elucidate the molecular mechanisms underlying HDM-induced AR.

View Article and Find Full Text PDF

[Advances in the study of viruses inhibiting the production of advanced autophagy or interferon through Rubicon to achieve innate immune escape].

Xi Bao Yu Fen Zi Mian Yi Xue Za Zhi

January 2025

Department of Pathogen Biology and Immunology, Kunming Medical University, Kunming 650500, China. *Corresponding authors, E-mail:

The innate immune response is the first line of defense for the host against viral infections. Targeted degradation of pathogenic microorganisms through autophagy, in conjunction with pattern recognition receptors synergistically inducing the production of interferon (IFN), constitutes an important pathway for the body to resist viral infections. Rubicon, a Run domain Beclin 1-interacting and cysteine-rich domain protein, has an inhibitory effect on autophagy and IFN production.

View Article and Find Full Text PDF

DNA-binding affinity and specificity determine the phenotypic diversity in BCL11B-related disorders.

Am J Hum Genet

January 2025

Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany; Institute of Human Genetics, University of Regensburg, 93053 Regensburg, Germany; Institute of Clinical Human Genetics, University Hospital Regensburg, 93053 Regensburg, Germany. Electronic address:

BCL11B is a Cys2-His2 zinc-finger (C2H2-ZnF) domain-containing, DNA-binding, transcription factor with established roles in the development of various organs and tissues, primarily the immune and nervous systems. BCL11B germline variants have been associated with a variety of developmental syndromes. However, genotype-phenotype correlations along with pathophysiologic mechanisms of selected variants mostly remain elusive.

View Article and Find Full Text PDF

The Arabidopsis root apical meristem is an excellent model for studying plant organ growth that involves a coordinated process of cell division, elongation, and differentiation, while each tissue type develops on its own schedule. Among these tissues, the protophloem is particularly important, differentiating early to supply nutrients and signalling molecules to the growing root tip. The OCTOPUS (OPS) protein and its homolog OPS-LIKE 2 (OPL2) are essential for proper root protophloem differentiation and, likely through this role, indirectly promote root growth.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!