PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction.

Chakradhar Guntuboina Adrita Das Parisa Mollaei Seongwon Kim Amir Barati Farimani

J Phys Chem Lett

Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

Published: November 2023

Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide's potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683064	PMC
http://dx.doi.org/10.1021/acs.jpclett.3c02398	DOI Listing

Publication Analysis

Top Keywords

language model

language models

model

peptidebert language

model based

based transformers

transformers peptide

peptide property

property prediction

prediction advances

Similar Publications

Optimizing VGG16 deep learning model with enhanced hunger games search for logo classification.

Sci Rep

December 2024

Department of Computer Science, Birzeit University, P.O. Box 14, Birzeit, West Bank, Palestine.

Mohammed Hussain Thaer Thaher Mohamed Basel Almourad Majdi Mafarja

Accurate classification of logos is a challenging task in image recognition due to variations in logo size, orientation, and background complexity. Deep learning models, such as VGG16, have demonstrated promising results in handling such tasks. However, their performance is highly dependent on optimal hyperparameter settings, whose fine-tuning is both labor-intensive and time-consuming.

View Article and Find Full Text PDF

Similar Publications

Use of large language models as artificial intelligence tools in academic research and publishing among global clinical researchers.

Sci Rep

December 2024

Department of Dermatology, Niazi Hospital, Lahore, Pakistan.

Tanisha Mishra Edward Sutanto Rini Rossanti Nayana Pant Anum Ashraf

With breakthroughs in Natural Language Processing and Artificial Intelligence (AI), the usage of Large Language Models (LLMs) in academic research has increased tremendously. Models such as Generative Pre-trained Transformer (GPT) are used by researchers in literature review, abstract screening, and manuscript drafting. However, these models also present the attendant challenge of providing ethically questionable scientific information.

View Article and Find Full Text PDF

Similar Publications

Evaluating large language models for criterion-based grading from agreement to consistency.

NPJ Sci Learn

December 2024

Department of Psychology, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Bandar Sunway, 475000, Malaysia.

Da-Wei Zhang Melissa Boey Yan Yu Tan Alexis Hoh Sheng Jia

This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.

View Article and Find Full Text PDF

Similar Publications

Deep neural networks and humans both benefit from compositional language structure.

Nat Commun

December 2024

LEADS group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.

Lukas Galke Yoav Ram Limor Raviv

Deep neural networks drive the success of natural language processing. A fundamental property of language is its compositional structure, allowing humans to systematically produce forms for new meanings. For humans, languages with more compositional and transparent structures are typically easier to learn than those with opaque and irregular structures.

View Article and Find Full Text PDF

Similar Publications

Validation of the Child and Youth Resilience Measure (CYRM-R) in rural contexts in South Africa.

J Child Adolesc Ment Health

December 2024

Department of Educational Psychology, University of Pretoria, Pretoria, South Africa.

Huiming Ding Patrick Callaghan Qing Gu Liesel Ebersöhn

Resilience is central to young children's healthy and happy development. The Child and Youth Resilience Measure (CYRM-R) has been widely used in several countries. However, its construct validity among young children in rural South Africa has not been examined.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!