Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids. The protein LMs (pLMs) were trained on the Summit supercomputer using 5616 GPUs and TPU Pod up-to 1024 cores. Dimensionality reduction revealed that the raw pLM-embeddings from unlabeled data captured some biophysical features of protein sequences. We validated the advantage of using the embeddings as exclusive input for several subsequent tasks: (1) a per-residue (per-token) prediction of protein secondary structure (3-state accuracy Q3=81%-87%); (2) per-protein (pooling) predictions of protein sub-cellular location (ten-state accuracy: Q10=81%) and membrane versus water-soluble (2-state accuracy Q2=91%). For secondary structure, the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without multiple sequence alignments (MSAs) or evolutionary information thereby bypassing expensive database searches. Taken together, the results implied that pLMs learned some of the grammar of the language of life. All our models are available through https://github.com/agemagician/ProtTrans.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2021.3095381DOI Listing

Publication Analysis

Top Keywords

language life
8
protein sequences
8
secondary structure
8
protein
5
prottrans understanding
4
language
4
understanding language
4
life self-supervised
4
self-supervised learning
4
learning computational
4

Similar Publications

Sex Differences in Late-life Cognition: A Psychometric Network Analysis of Well-functioning Older Adults.

Am J Geriatr Psychiatry

December 2024

Department of Gerontology, Faculty of Social Welfare & Health Sciences, University of Haifa, 199 Aba Khoushy Ave, Haifa, 3498838, Israel. Electronic address:

Objective: Unidentified sex differences in old-age cognition may emerge in psychometric networks, which look beyond mean scores into the unique cognitive structure of males and females. Accordingly, this study aims to examine cognition in well-functioning older males and females with psychometric network analysis.

Methods: The current cohort (N = 2,802) of community-dwelling adults (≥65 years) was derived from the Advanced Cognitive Training for Independent and Vital Elderly study.

View Article and Find Full Text PDF

Background: White matter lesions and subclinical cerebral ischemia (SCI) are described as risk factors for postoperative cognitive decline (POCD) following cardiac surgery. This report aims to investigate the effect of brain lesions on postoperative cognitive training outcomes.

Methods: In a randomized, treatment-as-usual controlled trial, elderly patients scheduled for elective heart valve surgery participated.

View Article and Find Full Text PDF

Ethnic Differences in the Association Between Cognitive Performance and Informant-rated Cognitive Decline.

Am J Geriatr Psychiatry

December 2024

Department of Neurology (EMB, DAL, NG, DBZ, LBM), University of Michigan Medical School, Ann Arbor, MI; School of Public Health (RM, LBM), University of Michigan, Ann Arbor, MI.

Objectives: It is unknown whether cognitive test scores are equivalently associated with informant-rated cognitive decline across culturally and linguistically diverse older adults. We examined the association between cognitive domain scores on the Harmonized Cognitive Assessment Protocol (HCAP) and informant-rated cognitive decline in a harmonized population-based sample of older adults.

Design, Setting, And Participants: We combined data from the HCAP sub-study of the Health and Retirement Study (HRS; 2016) and the Brain Attack Surveillance in Corpus Christi-Cognitive (BASIC-C; 2018-2020) study.

View Article and Find Full Text PDF

Background And Purpose: Vestibular migraine (VM) is a common clinical disorder with a genetic predisposition characterized by recurrent episodes of dizziness/vertigo. Patients often complain of the presence of cognitive dysfunction manifestations such as memory loss, which causes great distress in daily life. In this study, we will explore the characteristics and possible risk factors of VM-related cognitive dysfunction by observing the cognitive function and vestibular function status of VM patients, laying the foundation for further exploration of the mechanisms of VM-related cognitive dysfunction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!