Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry.

Anastasiya V Kulikova Daniel J Diaz Tianlong Chen T Jeffrey Cole Andrew D Ellington Claus O Wilke

bioRxiv

Department of Integrative Biology, University of Texas at Austin, Austin, Texas, USA.

Published: July 2023

Deep learning models are seeing increased use as methods to predict mutational effects or allowed mutations in proteins. The models commonly used for these purposes include large language models (LLMs) and 3D Convolutional Neural Networks (CNNs). These two model types have very different architectures and are commonly trained on different representations of proteins. LLMs make use of the transformer architecture and are trained purely on protein sequences whereas 3D CNNs are trained on voxelized representations of local protein structure. While comparable overall prediction accuracies have been reported for both types of models, it is not known to what extent these models make comparable specific predictions and/or generalize protein biochemistry in similar ways. Here, we perform a systematic comparison of two LLMs and two structure-based models (CNNs) and show that the different model types have distinct strengths and weaknesses. The overall prediction accuracies are largely uncorrelated between the sequence- and structure-based models. Overall, the two structure-based models are better at predicting buried aliphatic and hydrophobic residues whereas the two LLMs are better at predicting solvent-exposed polar and charged amino acids. Finally, we find that a combined model that takes the individual model predictions as input can leverage these individual model strengths and results in significantly improved overall prediction accuracy.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055221	PMC
http://dx.doi.org/10.1101/2023.03.20.533508	DOI Listing

Publication Analysis

Top Keywords

structure-based models

models

sequence- structure-based

protein biochemistry

cnns model

model types

prediction accuracies

better predicting

individual model

model

Similar Publications

A Generalized Bayesian Stochastic Block Model for Microbiome Community Detection.

Stat Med

February 2025

Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas.

Kevin C Lutz Michael L Neugent Tejasv Bedi Nicole J De Nisco Qiwei Li

Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated microbiome studies. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility.

View Article and Find Full Text PDF

Similar Publications

Validation of a Questionnaire Assessing Pregnant Women's Perspectives on Addressing the Psychological Challenges of Childbirth.

Nurs Rep

December 2024

Department of Microbiology, Parasitology and Virology, Faculty of Midwives and Nursing, "Carol Davila" University of Medicine and Pharmacy, 020021 Bucharest, Romania.

Mihaela Corina Radu Mihai Sebastian Armean Razvan Daniel Chivu Justin Aurelian Melania Elena Pop-Tudose

Introduction: Pregnant women's experiences and concerns regarding childbirth are complex, necessitating a multidimensional and personalized approach in maternal care. This study explores the psychological and emotional factors influencing pregnant women's decisions regarding their mode of delivery. The results will provide valuable insights for the development of educational and counseling strategies designed to support pregnant women in making informed and conscious decisions about their childbirth.

View Article and Find Full Text PDF

Similar Publications

Multilayered screening for multi-targeted anti-Alzheimer's and anti-Parkinson's agents through structure-based pharmacophore modelling, MCDM, docking, molecular dynamics and DFT: a case study of HDAC4 inhibitors.

In Silico Pharmacol

January 2025

Laboratory of Drug Discovery and Ecotoxicology, Department of Pharmacy, Guru Ghasidas Vishwavidyalaya, Bilaspur, 495009 India.

Nikita Chhabra Balaji Wamanrao Matore Nisha Lakra Purusottam Banjare Anjali Murmu

Abstract: Alzheimer's disease (AD) and Parkinson's disease (PD) are neurological conditions that primarily impact the elderly having distinctive traits and some similarities in terms of symptoms and progression. The multifactorial nature of AD and PD encourages exploring potentiality of multi-target therapy for addressing these conditions to conventional, the "one drug one target" strategy. This study highlights the searching of potential HDAC4 inhibitors through multiple screening approaches.

View Article and Find Full Text PDF

Similar Publications

Deep learning methods for proteome-scale interaction prediction.

Curr Opin Struct Biol

January 2025

Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea. Electronic address:

Min Su Yoon Byunghyun Bae Kunhee Kim Hahnbeom Park Minkyung Baek

Proteome-scale interaction prediction is essential for understanding protein functions and disease mechanisms. Traditional experimental methods are often limited by scale and complexity, driving the need for computational approaches. Deep learning has emerged as a powerful tool, enabling high-throughput, accurate predictions of protein interactions.

View Article and Find Full Text PDF

Similar Publications

On topological characterizations and computational analysis of benzenoid networks for drug discovery and development.

J Mol Graph Model

January 2025

Department of Mathematics & Actuarial Science, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu, 600048, India. Electronic address:

Pradeepa A Arathi P

Topological indices are numerical invariants that provide key insights into the structural properties of molecular graphs and are crucial in predicting physio-chemical and biological activities. This paper applies established computational methodologies for analyzing benzenoid networks and their application to polycyclic aromatic hydrocarbons (PAHs) through degree-based topological indices computed via M-polynomial and NM-polynomial approaches. By examining tessellations, including linear chain, hexagonal, rhomboidal, and triangular configurations alongside their line graphs, this work highlights the influence of molecular topology on biological activity.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!