Curr Opin Struct Biol
January 2025
The mRNA splicing machinery has been estimated to generate 100,000 known protein-coding transcripts for 20,000 human genes (Ensembl, Sept. 2024). However, this set is expanding with the massive and rapidly growing data coming from high-throughput technologies, particularly single-cell and long-read sequencing.
View Article and Find Full Text PDFMotivation: Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein language model (pLM) embeddings as input to a minimal deep learning model.
Results: To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth.
Proteins play a central role in biological processes, and understanding their conformational variability is crucial for unraveling their functional mechanisms. Recent advancements in high-throughput technologies have enhanced our knowledge of protein structures, yet predicting their multiple conformational states and motions remains challenging. This study introduces Dimensionality Analysis for protein Conformational Exploration (DANCE) for a systematic and comprehensive description of protein families conformational variability.
View Article and Find Full Text PDFThe wealth of genomic data has boosted the development of computational methods predicting the phenotypic outcomes of missense variants. The most accurate ones exploit multiple sequence alignments, which can be costly to generate. Recent efforts for democratizing protein structure prediction have overcome this bottleneck by leveraging the fast homology search of MMseqs2.
View Article and Find Full Text PDFN-terminal ends of polypeptides are critical for the selective co-translational recruitment of N-terminal modification enzymes. However, it is unknown whether specific N-terminal signatures differentially regulate protein fate according to their cellular functions. In this work, we developed an in-silico approach to detect functional preferences in cellular N-terminomes, and identified in S.
View Article and Find Full Text PDFAdvances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale. However, the energetics driving folding are invisible in these structures and remain largely unknown. The hidden thermodynamics of folding can drive disease, shape protein evolution and guide protein engineering, and new approaches are needed to reveal these thermodynamics for every sequence and structure.
View Article and Find Full Text PDFAlternative splicing of repeats in proteins provides a mechanism for rewiring and fine-tuning protein interaction networks. In this work, we developed a robust and versatile method, ASPRING, to identify alternatively spliced protein repeats from gene annotations. ASPRING leverages evolutionary meaningful alternative splicing-aware hierarchical graphs to provide maps between protein repeats sequences and 3D structures.
View Article and Find Full Text PDFPhysical interactions between proteins are central to all biological processes. Yet, the current knowledge of who interacts with whom in the cell and in what manner relies on partial, noisy, and highly heterogeneous data. Thus, there is a need for methods comprehensively describing and organizing such data.
View Article and Find Full Text PDFMotivation: With the recent advances in protein 3D structure prediction, protein interactions are becoming more central than ever before. Here, we address the problem of determining how proteins interact with one another. More specifically, we investigate the possibility of discriminating near-native protein complex conformations from incorrect ones by exploiting local environments around interfacial residues.
View Article and Find Full Text PDFSummary: ASES is a versatile tool for assessing the impact of alternative splicing (AS), initiation and termination of transcription on protein diversity in evolution. It identifies exon and transcript orthogroups from a set of input genes/species for comparative transcriptomics analyses. It computes an evolutionary splicing graph, where the nodes are exon orthogroups, allowing for a direct evaluation of AS conservation.
View Article and Find Full Text PDFProteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner.
View Article and Find Full Text PDFThe potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions.
View Article and Find Full Text PDFUnderstanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalizing the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species.
View Article and Find Full Text PDFIn light of the recent very rapid progress in protein structure prediction, accessing the multitude of functional protein states is becoming more central than ever before. Indeed, proteins are flexible macromolecules, and they often perform their function by switching between different conformations. However, high-resolution experimental techniques such as X-ray crystallography and cryogenic electron microscopy can catch relatively few protein functional states.
View Article and Find Full Text PDFBackground: Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro.
View Article and Find Full Text PDFSolute carrier (SLC) transporters are emerging drug targets. Identifying the molecular determinants responsible for their specific and selective transport activities and describing key interactions with their ligands are crucial steps towards the design of potential new drugs. A general functional mapping across more than 400 human SLC transporters would pave the way to the rational and systematic design of molecules modulating cellular transport.
View Article and Find Full Text PDFLarge macromolecules, including proteins and their complexes, very often adopt multiple conformations. Some of them can be seen experimentally, for example with x-ray crystallography or cryo-electron microscopy. This structural heterogeneity is not occasional and is frequently linked with specific biological function.
View Article and Find Full Text PDFThe systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering, and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling intersite dependencies within biological sequences. However, state-of-the-art methods remain time consuming.
View Article and Find Full Text PDFThe growing body of experimental and computational data describing how proteins interact with each other has emphasized the multiplicity of protein interactions and the complexity underlying protein surface usage and deformability. In this work, we propose new concepts and methods toward deciphering such complexity. We introduce the notion of interacting region to account for the multiple usage of a protein's surface residues by several partners and for the variability of protein interfaces coming from molecular flexibility.
View Article and Find Full Text PDFCharacterizing a protein mutational landscape is a very challenging problem in Biology. Many disease-associated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom biomolecular simulations to predict mutational outcomes and we quantitatively assess the predictions on several hundreds of mutants.
View Article and Find Full Text PDFSeveral models estimating the strength of the interaction between proteins in a complex have been proposed. By exploring the geometry of contact distribution at protein-protein interfaces, we provide an improved model of binding energy. Local interaction signal analysis (LISA) is a radial function based on terms describing favorable and non-favorable contacts obtained by density functional theory, the support-core-rim interface residue distribution, non-interacting charged residues and secondary structures contribution.
View Article and Find Full Text PDF