Publications by authors named "B Rost"

Large Language Models for proteins, namely protein Language Models (pLMs), have begun to provide an important alternative to capturing the information encoded in a protein sequence in computers. Arguably, pLMs have advanced importantly to understanding aspects of the language of life as written in proteins, and through this understanding, they are becoming an increasingly powerful means of advancing protein prediction, e.g.

View Article and Find Full Text PDF

Protein language models (pLMs) capture some aspects of the grammar of the language of life as written in protein sequences. The so-called pLM embeddings implicitly contain this information. Therefore, embeddings can serve as the exclusive input into downstream supervised methods for protein prediction.

View Article and Find Full Text PDF

Adapting language models to protein sequences spawned the development of powerful protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein structure prediction. Now we can systematically and comprehensively explore the dual nature of proteins that act and exist as three-dimensional (3D) machines and evolve as linear strings of one-dimensional (1D) sequences.

View Article and Find Full Text PDF

Motivation: Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein language model (pLM) embeddings as input to a minimal deep learning model.

Results: To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth.

View Article and Find Full Text PDF
Article Synopsis
  • The study introduces a new method called SAGES, which combines gene expression data with structural features of proteins to better understand protein evolution and function.
  • Using SAGES and machine learning, researchers analyzed tissue samples from healthy individuals and breast cancer patients, focusing on gene expression and protein profiles.
  • Key findings include the detection of intrinsically disordered regions in breast cancer proteins and potential links between drug responses and cancer signatures, indicating SAGES' broad applicability for studying biological processes.
View Article and Find Full Text PDF