The accurate identification of protein-ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet.
View Article and Find Full Text PDFProtein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference.
View Article and Find Full Text PDFFragment screening is used to identify binding sites and leads in drug discovery, but it is often unclear which binding sites are functionally important. Here, data from 37 experiments, and 1309 protein structures binding to 1601 ligands were analysed. A method to group ligands by binding sites is introduced and sites clustered according to profiles of relative solvent accessibility.
View Article and Find Full Text PDFEukaryotic genes are interrupted by introns that are removed from transcribed RNAs by splicing. Patterns of splicing complexity differ between species, but it is unclear how these differences arise. We used inter-species association mapping with Saccharomycotina species to correlate splicing signal phenotypes with the presence or absence of splicing factors.
View Article and Find Full Text PDFProtein kinases are major regulators of cellular processes, but the roles of most kinases remain unresolved. Dictyostelid social amoebas have been useful in identifying functions for 30% of its kinases in cell migration, cytokinesis, vesicle trafficking, gene regulation and other processes but their upstream regulators and downstream effectors are mostly unknown. Comparative genomics can assist to distinguish between genes involved in deeply conserved core processes and those involved in species-specific innovations, while co-expression of genes as evident from comparative transcriptomics can provide cues to the protein complement of regulatory networks.
View Article and Find Full Text PDFAlternative splicing of messenger RNAs is associated with the evolution of developmentally complex eukaryotes. Splicing is mediated by the spliceosome, and docking of the pre-mRNA 5' splice site into the spliceosome active site depends upon pairing with the conserved ACAGA sequence of U6 snRNA. In some species, including humans, the central adenosine of the ACGA box is modified by methylation, but the role of this mA modification is poorly understood.
View Article and Find Full Text PDFA prerequisite to exploiting soil microbes for sustainable crop production is the identification of the plant genes shaping microbiota composition in the rhizosphere, the interface between roots and soil. Here, we use metagenomics information as an external quantitative phenotype to map the host genetic determinants of the rhizosphere microbiota in wild and domesticated genotypes of barley, the fourth most cultivated cereal globally. We identify a small number of loci with a major effect on the composition of rhizosphere communities.
View Article and Find Full Text PDFThe nutrient-rich tubers of the greater yam, Dioscorea alata L., provide food and income security for millions of people around the world. Despite its global importance, however, greater yam remains an orphan crop.
View Article and Find Full Text PDFPLoS Comput Biol
March 2022
SARS-CoV-2 Spike (Spike) binds to human angiotensin-converting enzyme 2 (ACE2) and the strength of this interaction could influence parameters relating to virulence. To explore whether population variants in ACE2 influence Spike binding and hence infection, we selected 10 ACE2 variants based on affinity predictions and prevalence in gnomAD and measured their affinities and kinetics for Spike receptor binding domain through surface plasmon resonance (SPR) at 37°C. We discovered variants that reduce and enhance binding, including three ACE2 variants that strongly inhibited (p.
View Article and Find Full Text PDFThe interaction between the SARS-CoV-2 virus Spike protein receptor binding domain (RBD) and the ACE2 cell surface protein is required for viral infection of cells. Mutations in the RBD are present in SARS-CoV-2 variants of concern that have emerged independently worldwide. For example, the B.
View Article and Find Full Text PDFAnkyrin protein repeats bind to a wide range of substrates and are one of the most common protein motifs in nature. Here, we collate a high-quality alignment of 7,407 ankyrin repeats and examine for the first time, the distribution of human population variants from large-scale sequencing of healthy individuals across this family. Population variants are not randomly distributed across the genome but are constrained by gene essentiality and function.
View Article and Find Full Text PDFGenes involved in disease resistance are some of the fastest evolving and most diverse components of genomes. Large numbers of nucleotide-binding, leucine-rich repeat (NLR) genes are found in plant genomes and are required for disease resistance. However, NLRs can trigger autoimmunity, disrupt beneficial microbiota or reduce fitness.
View Article and Find Full Text PDFTranscription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification.
View Article and Find Full Text PDFIn this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview's core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the command line. The accompanying notes provide background information on the underlying methods and discuss additional options for working with Jalview to perform multiple sequence alignment, functional site analysis, and publication of alignments on the web.
View Article and Find Full Text PDFUnderstanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant and a mutant defective in mRNA methylation (mA). Here we show that mA can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length.
View Article and Find Full Text PDFThe Dundee Resource for Sequence Analysis and Structure Prediction (DRSASP; http://www.compbio.dundee.
View Article and Find Full Text PDFTetratricopeptide repeat (TPR) proteins belong to the class of α-solenoid proteins, in which repetitive units of α-helical hairpin motifs stack to form superhelical, often highly flexible structures. TPR domains occur in a wide variety of proteins, and perform key functional roles including protein folding, protein trafficking, cell cycle control and post-translational modification. Here, we look at the TPR domain of the enzyme O-linked GlcNAc-transferase (OGT), which catalyses O-GlcNAcylation of a broad range of substrate proteins.
View Article and Find Full Text PDFMotivation: RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae.
View Article and Find Full Text PDFSummary: JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.
View Article and Find Full Text PDFThe translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces.
View Article and Find Full Text PDFProtein O-GlcNAcylation (O-GlcNAc) is an essential post-translational modification (PTM) in higher eukaryotes. The O-linked β-N-acetylglucosamine transferase (OGT), targets specific Serines and Threonines (S/T) in intracellular proteins. However, unlike phosphorylation, fewer than 25% of known O-GlcNAc sites match a clear sequence pattern.
View Article and Find Full Text PDFBackground: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data.
View Article and Find Full Text PDFCoordination of cell movement with cell differentiation is a major feat of embryonic development. The Dictyostelium stalk always forms at the organizing tip, by a mechanism that is not understood. We previously reported that cyclic diguanylate (c-di-GMP), synthesized by diguanylate cyclase A (DgcA), induces stalk formation.
View Article and Find Full Text PDF