Seq2Phase: language model-based accurate prediction of client proteins in liquid-liquid phase separation.

Bioinform Adv

Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0032, Japan.

Published: December 2023

AI Article Synopsis

  • Liquid-liquid phase separation (LLPS) allows cells to form compartments without membranes and is important for cellular functions and structures like nucleoli and p-bodies, but predicting the proteins involved in LLPS is still a challenge.
  • The study introduces Seq2Phase, a deep-learning model that accurately predicts LLPS client proteins by analyzing amino-acid sequences, demonstrating its effectiveness across various species.
  • The findings suggest many LLPS client proteins are yet to be discovered and that Seq2Phase will enhance our knowledge of LLPS's molecular mechanisms and its implications in diseases, with the software code available for public use.

Article Abstract

Motivation: Liquid-liquid phase separation (LLPS) enables compartmentalization in cells without biological membranes. LLPS plays essential roles in membraneless organelles such as nucleoli and p-bodies, helps regulate cellular physiology, and is linked to amyloid formation. Two types of proteins, scaffolds and clients, are involved in LLPS. However, computational methods for predicting LLPS client proteins from amino-acid sequences remain underdeveloped.

Results: Here, we present Seq2Phase, an accurate predictor of LLPS client proteins. Information-rich features are extracted from amino-acid sequences by a deep-learning technique, Transformer, and fed into supervised machine learning. Predicted client proteins contained known LLPS regulators and showed localization enrichment into membraneless organelles, confirming the validity of the prediction. Feature analysis revealed that scaffolds and clients have different sequence properties and that textbook knowledge of LLPS-related proteins is biased and incomplete. Seq2Phase achieved high accuracies across human, mouse, yeast, and plant, showing that the method is not overfitted to specific species and has broad applicability. We predict that more than hundreds or thousands of LLPS client proteins remain undiscovered in each species and that Seq2Phase will advance our understanding of still enigmatic molecular and physiological bases of LLPS as well as its roles in disease.

Availability And Implementation: The software codes in Python underlying this article are available at https://github.com/IwasakiLab/Seq2Phase.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777356PMC
http://dx.doi.org/10.1093/bioadv/vbad189DOI Listing

Publication Analysis

Top Keywords

client proteins
20
llps client
12
liquid-liquid phase
8
phase separation
8
llps
8
membraneless organelles
8
scaffolds clients
8
amino-acid sequences
8
proteins
7
client
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!