PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset.

Brief Bioinform

Key Laboratory of Organ Regeneration and Reconstruction, State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.

Published: July 2024

Protein solubility plays a crucial role in various biotechnological, industrial, and biomedical applications. With the reduction in sequencing and gene synthesis costs, the adoption of high-throughput experimental screening coupled with tailored bioinformatic prediction has witnessed a rapidly growing trend for the development of novel functional enzymes of interest (EOI). High protein solubility rates are essential in this process and accurate prediction of solubility is a challenging task. As deep learning technology continues to evolve, attention-based protein language models (PLMs) can extract intrinsic information from protein sequences to a greater extent. Leveraging these models along with the increasing availability of protein solubility data inferred from structural database like the Protein Data Bank holds great potential to enhance the prediction of protein solubility. In this study, we curated an Updated Escherichia coli protein Solubility DataSet (UESolDS) and employed a combination of multiple PLMs and classification layers to predict protein solubility. The resulting best-performing model, named Protein Language Model-based protein Solubility prediction model (PLM_Sol), demonstrated significant improvements over previous reported models, achieving a notable 6.4% increase in accuracy, 9.0% increase in F1_score, and 11.1% increase in Matthews correlation coefficient score on the independent test set. Moreover, additional evaluation utilizing our in-house synthesized protein resource as test data, encompassing diverse types of enzymes, also showcased the good performance of PLM_Sol. Overall, PLM_Sol exhibited consistent and promising performance across both independent test set and experimental set, thereby making it well suited for facilitating large-scale EOI studies. PLM_Sol is available as a standalone program and as an easy-to-use model at https://zenodo.org/doi/10.5281/zenodo.10675340.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11343611PMC
http://dx.doi.org/10.1093/bib/bbae404DOI Listing

Publication Analysis

Top Keywords

protein solubility
36
protein
15
protein language
12
solubility
10
language models
8
updated escherichia
8
escherichia coli
8
coli protein
8
solubility dataset
8
independent test
8

Similar Publications

[Exploration of CCL11 and sTNFR2 as potential biomarkers for the efficacy of lymphocyte immunotherapy in women with unexplained recurrent spontaneous abortion].

Zhonghua Fu Chan Ke Za Zhi

January 2025

Department of Obstetrics and Gynecology, Center for Reproductive Medicine, Peking University Third Hospital, State Key Laboratory of Female Fertility Promotion, National Clinical Research Center for Obstetric and Gynecologic Diseases, Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing100191, China.

To explore biomarkers for the efficacy of lymphocyte immunotherapy (LIT) treating women with unexplained recurrent spontaneous abortion (URSA). Serum samples from 24 URSA potients who received LIT were collected at Peking University Third Hospital from December 2014 to June 2015. Semiquantitative sandwich-based antibody arrays containing 40 cytokines were used to screen target immune cytokines in the peripheral blood of URSA patients before and after LIT.

View Article and Find Full Text PDF

Effects of isolation methods on physicochemical properties of defatted starch from the acorn (Quercus brantii).

Int J Biol Macromol

January 2025

Department of Food Science and Technology, College of Agriculture, Isfahan University of Technology, Isfahan 84156-83111, Iran; ONIRIS - GEPEA (UMR CNRS 6144), Site de la Géraudière CS 82225, 44322, Nantes cedex 3, France.

This study explores the innovative combined effects of alkaline isolation with ultrasound pretreatment on the physicochemical properties of acorn (Quercus brantii) starch. The optimal pH for maximizing the yield of alkaline-isolated acorn starch (AAS) was determined, followed by comparison with alkaline-isolated defatted acorn starch (ADAS), ultrasound-pretreated acorn starch (UAS), and ultrasound-pretreated defatted acorn starch (UDAS). The results demonstrated substantial improvements in yield and purity, with the highest yield (68.

View Article and Find Full Text PDF

Enhancing curcumin stability and bioavailability through chickpea protein isolate-citrus pectin conjugate emulsions: Targeted delivery and gut microecology modulation.

Int J Biol Macromol

January 2025

School of Food Science and Technology, Shihezi University, Shihezi, Xinjiang 832003, China; Key Laboratory of Characteristics Agricultural Product Processing and Quality Control (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Shihezi University, Shihezi, Xinjiang 832000, China; Key Laboratory for Food Nutrition and Safety Control of Xinjiang Production and Construction Corps, Shihezi University, Shihezi, Xinjiang 832000, China. Electronic address:

The limited solubility, rapid metabolism, and poor bioavailability of curcumin restrict its application. In this study, we synthesized chickpea protein isolate (CPI)-citrus pectin (CP) conjugates to prepare an emulsion delivery system that enhances the stability and bioavailability of curcumin. The CPI-CP emulsion achieved a curcumin encapsulation efficiency of 86.

View Article and Find Full Text PDF

Dectin-1 (CLEC7A), a C-type lectin-like receptor that recognizes β-1,3 glucans, has a key role in the innate immune system. While the lectin domain of mouse Dectin-1 has been solubilized and refolded from inclusion bodies in Escherichia coli, similar refolding of the human Dectin-1 lectin domain is hindered by the formation of misfolded multimers with aberrant intermolecular disulfide bonds. The aim of this study was to develop a method for the large-scale production of the human Dectin-1 lectin domain.

View Article and Find Full Text PDF

Intercellular communication is fundamental to multicellular life and a core determinant of outcomes during viral infection, where the common goals of virus and host for persistence and replication are generally at odds. Hosts rely on encoded innate and adaptive immune responses to detect and clear viral pathogens, while viruses can exploit or disrupt these pathways and other intercellular communication processes to enhance their spread and promote pathogenesis. While virus-induced signaling can result in systemic changes to the host, striking alterations are observed within the cellular microenvironment directly surrounding a site of infection, termed the virus microenvironment (VME).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!