Protein language models trained on the masked language modeling objective learn to predict the identity of hidden amino acid residues within a sequence using the remaining observable sequence as context. They do so by embedding the residues into a high dimensional space that encapsulates the relevant contextual cues. These embedding vectors serve as an informative context-sensitive representation that not only aids with the defined training objective, but can also be used for other tasks by downstream models. We propose a scheme to use the embeddings of an unmasked sequence to estimate the corresponding masked probability vectors for all the positions in a single forward pass through the language model. This One Fell Swoop (OFS) approach allows us to efficiently estimate the pseudo-perplexity of the sequence, a measure of the model's uncertainty in its predictions, that can also serve as a fitness estimate. We find that ESM2 OFS pseudo-perplexity performs nearly as well as the true pseudo-perplexity at fitness estimation, and more notably it defines a new state of the art on the ProteinGym Indels benchmark. The strong performance of the fitness measure prompted us to investigate if it could be used to detect the elevated stability reported in reconstructed ancestral sequences. We find that this measure ranks ancestral reconstructions as more fit than extant sequences. Finally, we show that the computational efficiency of the technique allows for the use of Monte Carlo methods that can rapidly explore functional sequence space.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261985PMC

Publication Analysis

Top Keywords

fell swoop
8
fitness estimation
8
sequence
5
pseudo-perplexity
4
pseudo-perplexity fell
4
swoop protein
4
fitness
4
protein fitness
4
estimation protein
4
protein language
4

Similar Publications

Protein language models trained on the masked language modeling objective learn to predict the identity of hidden amino acid residues within a sequence using the remaining observable sequence as context. They do so by embedding the residues into a high dimensional space that encapsulates the relevant contextual cues. These embedding vectors serve as an informative context-sensitive representation that not only aids with the defined training objective, but can also be used for other tasks by downstream models.

View Article and Find Full Text PDF

Protein language models trained on the masked language modeling objective learn to predict the identity of hidden amino acid residues within a sequence using the remaining observable sequence as context. They do so by embedding the residues into a high dimensional space that encapsulates the relevant contextual cues. These embedding vectors serve as an informative context-sensitive representation that not only aids with the defined training objective, but can also be used for other tasks by downstream models.

View Article and Find Full Text PDF

Exploitation of metal-organic framework (MOF) materials as active electrodes for energy storage or conversion is reasonably challenging owing to their poor robustness against various acidic/basic conditions and conventionally low electric conductivity. Keeping this in perspective, herein, a 3D ultramicroporous triazolate Fe-MOF (abbreviated as Fe-MET) is judiciously employed using cheap and commercially available starting materials. Fe-MET possesses ultra-stability against various chemical environments (pH-1 to pH-14 with varied organic solvents) and is highly electrically conductive (σ = 0.

View Article and Find Full Text PDF

Don't let perfect be the enemy of better: In defense of unparameterized megastudies.

Behav Brain Sci

February 2024

Department of Psychology & Neuroscience, Boston College, Chestnut Hill, MA, https://l3atbc.org.

The target article argues researchers should be more ambitious, designing studies that systematically and comprehensively explore the space of possible experiments in one fell swoop. We argue that while "systematic" is rarely achievable, "comprehensive" is often enough. Critically, the recent popularization of massive online experiments shows that comprehensive studies are achievable for most cognitive and behavioral research questions.

View Article and Find Full Text PDF

We describe the case of a 43-year-old female with hereditary hemochromatosis, previously without cardiac issues, who presented with a severe fever (>40 to 41 °C) to our hospital. Initial assessments, including transthoracic echocardiography, showed no typical signs of infective endocarditis. A contrast-enhanced CT scan revealed a hypodense area in the right subscapular muscle, alongside pleural thicknesses.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!