Unlabelled: BK polyomavirus (BKPyV) is a double-stranded DNA virus causing nephropathy, hemorrhagic cystitis, and urothelial cancer in transplant patients. The BKPyV-encoded capsid protein Vp1 and large T-antigen (LTag) are key targets of neutralizing antibodies and cytotoxic T-cells, respectively. Our single-center data suggested that variability in Vp1 and LTag may contribute to failing BKPyV-specific immune control and impact vaccine design.
View Article and Find Full Text PDFMotivation: Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence.
View Article and Find Full Text PDFThe prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction.
View Article and Find Full Text PDFCASP15 introduced a new category, ligand prediction, where participants were provided with a protein or nucleic acid sequence, SMILES line notation, and stoichiometry for ligands and tasked with generating computational models for the three-dimensional structure of the corresponding protein-ligand complex. These models were subsequently compared with experimental structures determined by x-ray crystallography or cryoEM. To assess these predictions, two novel scores were developed.
View Article and Find Full Text PDFWe are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy.
View Article and Find Full Text PDFDevelopments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature.
View Article and Find Full Text PDFPrediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
View Article and Find Full Text PDFRecent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input.
View Article and Find Full Text PDFStrigolactones (SLs) are rhizosphere signalling molecules and phytohormones. The biosynthetic pathway of SLs in tomato has been partially elucidated, but the structural diversity in tomato SLs predicts that additional biosynthetic steps are required. Here, root RNA-seq data and co-expression analysis were used for SL biosynthetic gene discovery.
View Article and Find Full Text PDFSesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species.
View Article and Find Full Text PDFMotivation: As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins.
View Article and Find Full Text PDFComput Struct Biotechnol J
April 2020
The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative.
View Article and Find Full Text PDFPlants exhibit a vast array of sesquiterpenes, C15 hydrocarbons which often function as herbivore-repellents or pollinator-attractants. These in turn are produced by a diverse range of sesquiterpene synthases. A comprehensive analysis of these enzymes in terms of product specificity has been hampered by the lack of a centralized resource of sufficient functionally annotated sequence data.
View Article and Find Full Text PDF