Bioinformatics
November 2024
Motivation: Accurate protein function prediction is crucial for understanding biological processes and advancing biomedical research. However, the rapid growth of protein sequences far outpaces the experimental characterization of their functions, necessitating the development of automated computational methods.
Results: We present InterLabelGO+, a hybrid approach that integrates a deep learning-based method with an alignment-based method for improved protein function prediction.
The signaling molecule cyclic di-GMP (cdG) controls the switch between bacterial motility and biofilm production, and fluctuations in cellular levels of cdG have been implicated in pathogenesis. Intracellular concentrations of cdG are controlled by the interplay of diguanylate cyclase (DGC) enzymes, which synthesize cdG to promote biofilms, and phosphodiesterase (PDE) enzymes, which hydrolyse cdG to drive motility. To track the complete regulatory logic of how responds to changing cdG levels, we followed a timecourse of overexpression of either the diguanylate cyclase QrgB or a variant of QrgB lacking catalytic activity (QrgB*).
View Article and Find Full Text PDFVisualizing and measuring molecular-scale interactions in living cells represents a major challenge, but recent advances in single-molecule super-resolution microscopy are bringing us closer to achieving this goal. Single-molecule super-resolution microscopy enables high-resolution and sensitive imaging of the positions and movement of molecules in living cells. HP1 proteins are important regulators of gene expression because they selectively bind and recognize H3K9 methylated (H3K9me) histones to form heterochromatin-associated protein complexes that silence gene expression, but several important mechanistic details of this process remain unexplored.
View Article and Find Full Text PDFDespite the increasing number of 3D RNA structures in the Protein Data Bank, the majority of experimental RNA structures lack thorough functional annotations. As the significance of the functional roles played by noncoding RNAs becomes increasingly apparent, comprehensive annotation of RNA function is becoming a pressing concern. In response to this need, we have developed FURNA (Functions of RNAs), the first database for experimental RNA structures that aims to provide a comprehensive repository of high-quality functional annotations.
View Article and Find Full Text PDFSequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction.
View Article and Find Full Text PDFMany approaches for measuring three-dimensional chromosomal conformations rely upon formaldehyde crosslinking followed by subsequent proximity ligation, a family of methods exemplified by 3C, Hi-C, etc. Here we provide an alternative crosslinking-free procedure for high-throughput identification of long-range contacts in the chromosomes of enterobacteria, making use of contact-dependent transposition of phage Mu to identify distant loci in close contact. The procedure described here will suffice to provide a comprehensive map of transposition frequencies between tens of thousands of loci in a bacterial genome, with the resolution limited by the diversity of the insertion site library used and the sequencing depth applied.
View Article and Find Full Text PDFRecent research has indicated the presence of heterochromatin-like regions of extended protein occupancy and transcriptional silencing of bacterial genomes. We utilized an integrative approach to track chromatin structure and transcription in K-12 across a wide range of nutrient conditions. In the process, we identified multiple loci which act similarly to facultative heterochromatin in eukaryotes, normally silenced but permitting expression of genes under specific conditions.
View Article and Find Full Text PDFUnlabelled: the causative agent of the diarrheal disease cholera, poses an ongoing health threat due to its wide repertoire of horizontally acquired elements (HAEs) and virulence factors. New clinical isolates of the bacterium with improved fitness abilities, often associated with HAEs, frequently emerge. The appropriate control and expression of such genetic elements is critical for the bacteria to thrive in the different environmental niches they occupy.
View Article and Find Full Text PDFhas been a vital model organism for studying chromosomal structure, thanks, in part, to its small and circular genome (4.6 million base pairs) and well-characterized biochemical pathways. Over the last several decades, we have made considerable progress in understanding the intricacies of the structure and subsequent function of the nucleoid.
View Article and Find Full Text PDFThe sequence-specific RNA-binding protein Pumilio (Pum) controls development; however, the network of mRNAs that it regulates remains incompletely characterized. In this study, we use knockdown and knockout approaches coupled with RNA-seq to measure the impact of Pum on the transcriptome of cells in culture. We also use an improved RNA coimmunoprecipitation method to identify Pum-bound mRNAs in embryos.
View Article and Find Full Text PDFA pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions, GapR-seq for measuring supercoiling, and HBD-seq or DRIP-seq for R-loop positioning. Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates.
View Article and Find Full Text PDFThe breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models.
View Article and Find Full Text PDFUnlabelled: , the causative agent of the diarrheal disease cholera, poses an ongoing health threat due to its wide repertoire of horizontally acquired elements (HAEs) and virulence factors. New clinical isolates of the bacterium with improved fitness abilities, often associated with HAEs, frequently emerge. The appropriate control and expression of such genetic elements is critical for the bacteria to thrive in the different environmental niches it occupies.
View Article and Find Full Text PDFDespite the increasing number of 3D RNA structures in the Protein Data Bank, the majority of experimental RNA structures lack thorough functional annotations. As the significance of the functional roles played by non-coding RNAs becomes increasingly apparent, comprehensive annotation of RNA function is becoming a pressing concern. In response to this need, we have developed FURNA (Functions of RNAs), the first database for experimental RNA structures that aims to provide a comprehensive repository of high-quality functional annotations.
View Article and Find Full Text PDFLeveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.
View Article and Find Full Text PDFSequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction.
View Article and Find Full Text PDFMisfolded endoplasmic reticulum (ER) proteins are degraded through a process called ER-associated degradation (ERAD). Soluble, lumenal ERAD targets are recognized, retrotranslocated across the ER membrane, ubiquitinated, extracted from the membrane, and degraded by the proteasome using an ERAD pathway containing a ubiquitin ligase called Hrd1. To determine how Hrd1 mediates these processes, we developed a deep mutational scanning approach to identify residues involved in Hrd1 function, including those exclusively required for lumenal degradation.
View Article and Find Full Text PDFRNAs are fundamental in living cells and perform critical functions determined by their tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. We present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experimentally solved RNA structures, where the learned knowledge is converted into a hybrid energy potential to guide RNA structure assembly.
View Article and Find Full Text PDFThe sequence-specific RNA-binding protein Pumilio controls development of ; however, the network of mRNAs that it regulates remains incompletely characterized. In this study, we utilize knockdown and knockout approaches coupled with RNA-Seq to measure the impact of Pumilio on the transcriptome of cells. We also used an improved RNA co-immunoprecipitation method to identify Pumilio bound mRNAs in embryos.
View Article and Find Full Text PDFVisualizing and measuring molecular-scale interactions in living cells represents a major challenge, but recent advances in microscopy are bringing us closer to achieving this goal. Single-molecule super-resolution microscopy enables high-resolution and sensitive imaging of the positions and movement of molecules in living cells. HP1 proteins are important regulators of gene expression because they selectively bind and recognize H3K9 methylated (H3K9me) histones to form heterochromatin-associated protein complexes that silence gene expression.
View Article and Find Full Text PDF