Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre.

Proteins

Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London SW7 2AY, United Kingdom.

Published: February 2008

Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.

Download full-text PDF

Source
http://dx.doi.org/10.1002/prot.21688DOI Listing

Publication Analysis

Top Keywords

fold recognition
12
correct homologous
8
homologous query-template
8
query-template relationships
8
accurately annotated
8
increase number
8
exploring extremes
4
extremes sequence/structure
4
sequence/structure space
4
ensemble
4

Similar Publications

Coding Variants of the Genitourinary Development Gene Carry High Risk for Prostate Cancer.

JCO Precis Oncol

January 2025

Medical Research Service, Department of Veterans Affairs, Tennessee Valley Healthcare System, Nashville, TN.

Purpose: Considerable genetic heterogeneity is currently thought to underlie hereditary prostate cancer (HPC). Most families meeting criteria for HPC cannot be attributed to currently known pathogenic variants.

Methods: To discover pathogenic variants predisposing to prostate cancer, we conducted a familial case-control association study using both genome-wide single-allele and identity-by-descent analytic approaches.

View Article and Find Full Text PDF

Drug-induced autoimmunity (DIA) is a non-IgE immune-related adverse drug reaction that poses substantial challenges in predictive toxicology due to its idiosyncratic nature, complex pathogenesis, and diverse clinical manifestations. To address these challenges, we developed InterDIA, an interpretable machine learning framework for predicting DIA toxicity based on molecular physicochemical properties. Multi-strategy feature selection and advanced ensemble resampling approaches were integrated to enhance prediction accuracy and overcome data imbalance.

View Article and Find Full Text PDF

Extracellular vesicles (EVs) are gaining recognition as promising therapeutic carriers for immune modulation. We investigated the potential of EVs derived from HEK293FT cells to stabilize and deliver interleukin-10 (IL-10), a key anti-inflammatory cytokine. Using minicircle (MC) DNA vectors, we achieved IL-10 overexpression and efficient incorporation into EVs, yielding superior stability compared to free, recombinant IL-10 protein.

View Article and Find Full Text PDF

Self-assembled aptamer nanoparticles for enhanced recognition and anticancer therapy through a lysosome-independent pathway.

Acta Biomater

January 2025

Shanghai Institute of virology, Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China; Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, PR China; Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, College of Biology, Aptamer Engineering Center of Hunan Province, Hunan University, Changsha, Hunan 410082, PR China. Electronic address:

Aptamers and aptamer-drug conjugates (ApDCs) have shown some success as targeted therapies in cancer theranostics. However, their stability in complex media and their capacity to evade lysosomal breakdown still need improvement. To address these challenges, we herein developed a one-step self-assembly strategy to improve the stability of aptamers or ApDCs, while simultaneously enhancing their delivery performance and therapeutic efficiency through a lysosome-independent pathway.

View Article and Find Full Text PDF

Anthropic activities have significantly elevated cadmium levels, making it a significant stressor in aquatic ecosystems. Present in high concentrations across water bodies, cadmium is known to bioaccumulate and biomagnify throughout the food chain. While the toxic effects of cadmium on the organs and tissues of aquatic species are well-documented, little is known about its impact on sensory systems crucial for survival.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!