Advances in deep learning have significantly aided protein engineering in addressing challenges in industrial production, healthcare, and environmental sustainability. This review frames frequently researched problems in protein understanding and engineering from the perspective of deep learning. It provides a thorough discussion of representation methods for protein sequences and structures, along with general encoding pipelines that support both pre-training and supervised learning tasks.
View Article and Find Full Text PDFDeep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions.
View Article and Find Full Text PDFIncreasing the binding affinity of an antibody to its target antigen is a crucial task in antibody therapeutics development. This paper presents a pretrainable geometric graph neural network, GearBind, and explores its potential in in silico affinity maturation. Leveraging multi-relational graph construction, multi-level geometric message passing and contrastive pretraining on mass-scale, unlabeled protein structural data, GearBind outperforms previous state-of-the-art approaches on SKEMPI and an independent test set.
View Article and Find Full Text PDFFine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is nontrivial due to the different training strategies and data forms.
View Article and Find Full Text PDFProkaryotic Argonaute (pAgo) proteins, a class of DNA/RNA-guided programmable endonucleases, have been extensively utilized in nucleic acid-based biosensors. The specific binding and cleavage of nucleic acids by pAgo proteins, which are crucial processes for their applications, are dependent on the presence of Mn bound in the pockets, as verified through X-ray crystallography. However, a comprehensive understanding of how dissociated Mn in the solvent affects the catalytic cycle, and its underlying regulatory role in this structure-function relationship, remains underdetermined.
View Article and Find Full Text PDFPhosphorylation of proteins plays an important regulatory role at almost all levels of cellular organization. Molecular dynamics (MD) simulation is a promising tool to reveal the mechanism of how phosphorylation regulates many key biological processes at the atomistic level. MD simulation accuracy depends on force field precision, while the current force fields for phospho-amino acids have resulted in notable inconsistency with experimental data.
View Article and Find Full Text PDFAncestral metabolism has remained controversial due to a lack of evidence beyond sequence-based reconstructions. Although prebiotic chemists have provided hints that metabolism might originate from non-enzymatic protometabolic pathways, gaps between ancestral reconstruction and prebiotic processes mean there is much that is still unknown. Here, we apply proteome-wide 3D structure predictions and comparisons to investigate ancestorial metabolism of ancient bacteria and archaea, to provide information beyond sequence as a bridge to the prebiotic processes.
View Article and Find Full Text PDFPhosphorylation plays a key role in plant biology, such as the accumulation of plant cells to form the observed proteome. Statistical analysis found that many phosphorylation sites are located in disordered regions. However, current force fields are mainly trained for structural proteins, which might not have the capacity to perfectly capture the dynamic conformation of the phosphorylated proteins.
View Article and Find Full Text PDFBackground: Prokaryotic Argonaute (pAgo) proteins are well-known oligonucleotide-directed endonucleases, which contain a conserved PIWI domain required for endonuclease activity. Distantly related to pAgos, PIWI-RE family, which is defined as PIWI with conserved R and E residues, has been suggested to exhibit divergent activities. The distinctive biochemical properties and physiological functions of PIWI-RE family members need to be elucidated to explore their applications in gene editing.
View Article and Find Full Text PDFHepatitis C virus (HCV) is a notorious member of the Flaviviridae family of enveloped, positive-strand RNA viruses. Non-structural protein 5A (NS5A) plays a key role in HCV replication and assembly. NS5A is a multi-domain protein which includes an N-terminal amphipathic membrane anchoring alpha helix, a highly structured domain-1, and two intrinsically disordered domains 2-3.
View Article and Find Full Text PDF