PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

Plant Direct

Agricultural Research Service, Crop Improvement and Genetics Research Unit U.S. Department of Agriculture Albany CA United States.

Published: December 2023

Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10732782PMC
http://dx.doi.org/10.1002/pld3.554DOI Listing

Publication Analysis

Top Keywords

protein phosphorylation
20
phospholingo deepphos
12
phosphorylation
9
phosboost
8
phosphorylation prediction
8
protein language
8
language models
8
phosboost achieved
8
protein
7
prediction
5

Similar Publications

Excess Ub-K48 Induces Neuronal Apoptosis in Alzheimer's Disease.

J Integr Neurosci

December 2024

Department of Human Anatomy, School of Basic Medical Sciences, Wannan Medical College, 241002 Wuhu, Anhui, China.

Background: K48-linked ubiquitin chain (Ub-K48) is a crucial ubiquitin chain implicated in protein degradation within the ubiquitin-proteasome system. However, the precise function and molecular mechanism underlying the role of Ub-K48 in the pathogenesis of Alzheimer's disease (AD) and neuronal cell abnormalities remain unclear. The objective of this study was to examine the function of K48 ubiquitination in the etiology of AD, and its associated mechanism of neuronal apoptosis.

View Article and Find Full Text PDF

Introduction: Adrenergic activation of protein kinase A (PKA) in cardiac muscle targets the sarcolemma, sarcoplasmic reticulum, and contractile apparatus to increase contractile force and heart rate. In the thin filaments of the contractile apparatus, cardiac troponin I (cTnI) Ser22 and Ser23 in the cardiac-specific N-terminal peptide (NcTnI: residues 1 to 32) are the targets for PKA phosphorylation. Phosphorylation causes a 2-3 fold decrease of affinity of cTn for Ca associated with a higher rate of Ca dissociation from cTnC leading to a faster relaxation rate of the cardiac muscle (lusitropy).

View Article and Find Full Text PDF

Background: Transmembrane emp24 trafficking protein 3 (TMED3) is associated with the development of several tumors; however, whether TMED3 regulates the progression of prostate cancer remains unclear.

Materials And Methods: Short hairpin RNA was performed to repress TMED3 in prostate cancer cells (DU145 cells) and in a prostate cancer mice model to determine its function in prostate cancer and .

Results: In the present study, we found that TMED3 was highly expressed in prostate cancer cells.

View Article and Find Full Text PDF

Background: CLP36 is also known as PDZ and LIM Domain 1 (PDLIM1) that is a ubiquitously-expressed α-actinin-binding cytoskeletal protein involved in carcinogenesis, and our current study aims to explore its involvement in lymphoma.

Methods: Accordingly, the CLP36 expression pattern in lymphoma and its association with the overall survival was predicted. Then, qPCR was applied to gauge CLP36 expression in lymphoma cells and determine the knockdown efficiency.

View Article and Find Full Text PDF

Immunofluorescence for Detection of TOR Kinase Activity In Situ in Photosynthetic Organisms.

Bio Protoc

December 2024

Instituto de Investigaciones en Biodiversidad y Biotecnología (INBIOTEC) and FIBA, Vieytes 3103, Mar del Plata, Argentina.

The target of rapamycin (TOR) is a central hub kinase that promotes growth and development in all eukaryote cells. TOR induces protein synthesis through the phosphorylation of the S6 kinase (S6K), which, in turn, phosphorylates ribosomal S6 protein (RPS6) increasing this anabolic process. Therefore, S6K and RPS6 phosphorylation are generally used as readouts of TOR activity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!