Leveraging conformal prediction to annotate enzyme function space with limited false positives.

PLoS Comput Biol

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America.

Published: May 2024

Machine learning (ML) is increasingly being used to guide biological discovery in biomedicine such as prioritizing promising small molecules in drug discovery. In those applications, ML models are used to predict the properties of biological systems, and researchers use these predictions to prioritize candidates as new biological hypotheses for downstream experimental validations. However, when applied to unseen situations, these models can be overconfident and produce a large number of false positives. One solution to address this issue is to quantify the model's prediction uncertainty and provide a set of hypotheses with a controlled false discovery rate (FDR) pre-specified by researchers. We propose CPEC, an ML framework for FDR-controlled biological discovery. We demonstrate its effectiveness using enzyme function annotation as a case study, simulating the discovery process of identifying the functions of less-characterized enzymes. CPEC integrates a deep learning model with a statistical tool known as conformal prediction, providing accurate and FDR-controlled function predictions for a given protein enzyme. Conformal prediction provides rigorous statistical guarantees to the predictive model and ensures that the expected FDR will not exceed a user-specified level with high probability. Evaluation experiments show that CPEC achieves reliable FDR control, better or comparable prediction performance at a lower FDR than existing methods, and accurate predictions for enzymes under-represented in the training data. We expect CPEC to be a useful tool for biological discovery applications where a high yield rate in validation experiments is desired but the experimental budget is limited.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11164347PMC
http://dx.doi.org/10.1371/journal.pcbi.1012135DOI Listing

Publication Analysis

Top Keywords

conformal prediction
12
biological discovery
12
enzyme function
8
false positives
8
discovery applications
8
discovery
6
prediction
5
biological
5
leveraging conformal
4
prediction annotate
4

Similar Publications

In plasma, the zymogens factor XII (FXII) and prekallikrein reciprocally convert each other to the proteases FXIIa and plasma kallikrein (PKa). PKa cleaves high-molecular-weight kininogen (HK) to release bradykinin, which contributes to regulation of blood vessel tone and permeability. Plasma FXII is normally in a "closed" conformation that limits activation by PKa.

View Article and Find Full Text PDF

Background: The immune heterogeneity of biliary atresia (BA) presents a challenge for development of prognostic biomarkers. This study aimed to identify early immune signatures associated with biliary drainage after Kasai Portoenterostomy (KPE).

Methods: Serum samples, liver slides, and clinical data were obtained from patients enrolled in the NIDDK-supported Childhood Liver Disease Research Network.

View Article and Find Full Text PDF

Background: In infected hosts, immune responses trigger a systemic energy reallocation away from energy storage and growth, to fuel a costly defense program. The exact energy costs of immune defense are however unknown in general. Life history theory predicts that such costs underpin trade-offs between host disease resistance and other fitness related traits, yet this has been seldom assessed.

View Article and Find Full Text PDF

Accurate and efficient representation of intramolecular energy in ab initio generation of crystal structures. Part III: partitioning into torsional groups.

Acta Crystallogr B Struct Sci Cryst Eng Mater

February 2025

Department of Chemical Engineering, Sargent Centre for Process Systems Engineering, Institute for Molecular Science and Engineering, Imperial College London, London SW7 2AZ, United Kingdom.

We present an approach to reduce this computational cost substantially, based on the partitioning of the molecule into geometrically separated torsional groups, with the dependence of the intramolecular energy and atomic point charges and dependent degrees of freedom on molecular conformation being computed as a linear combination of the contributions of these groups. This can lead to large savings in computational cost without a significant impact on accuracy, as demonstrated in the cases of N-acetyl-para-aminophenol (paracetamol) and methyl 4-hydroxybenzoate (methyl paraben). The approach is also applied successfully to two larger molecules, benzyl [4-(4-methyl-5-[(4-methylphenyl)sulfonyl]-1,3-thiazol-2-yl)phenyl]carbamate (molecule XX from the fifth CSP blind test) and (2S)-2-[4-(3-fluorobenzyloxy)benzylamino]propionamide (safinamide), for which we conduct the first reported CSP study.

View Article and Find Full Text PDF

Background: Breast cancer remains a significant global health challenge, requiring innovative therapeutic strategies. In silico methods, which leverage computational tools, offer a promising pathway for vaccine development. These methods facilitate antigen identification, epitope prediction, immune response modelling, and vaccine optimization, accelerating the design process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!