The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse.
View Article and Find Full Text PDFA-to-I RNA editing is the most common non-transient epitranscriptome modification. It plays several roles in human physiology and has been linked to several disorders. Large-scale deep transcriptome sequencing has fostered the characterization of A-to-I editing at the single nucleotide level and the development of dedicated computational resources.
View Article and Find Full Text PDFMalaria is a deadly disease caused by Apicomplexan parasites of the genus. Several species of the genus are known to be infectious to humans, of which is the most virulent. Post-translational modifications (PTMs) of proteins coordinate cell signaling and hence regulate many biological processes in homeostasis and host infection, of which the most highly studied is phosphorylation.
View Article and Find Full Text PDFPhosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins.
View Article and Find Full Text PDFThe availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. , a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples.
View Article and Find Full Text PDFThe cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused.
View Article and Find Full Text PDFPhosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins.
View Article and Find Full Text PDFThe availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs.
View Article and Find Full Text PDFThe increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively.
View Article and Find Full Text PDFThe number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Data Dependent Acquisition datasets, the re-use of DIA datasets has been rather limited to date, despite its high potential, due to the technical challenges involved. We introduce a (re-)analysis pipeline for public SWATH-MS datasets which includes a combination of metadata annotation protocols, automated workflows for MS data analysis, statistical analysis, and the integration of the results into the Expression Atlas resource.
View Article and Find Full Text PDFPhosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control.
View Article and Find Full Text PDFThe EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools.
View Article and Find Full Text PDFThe PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.
View Article and Find Full Text PDFProteins and RNA functionally and physically intersect in multiple biological processes, however, currently no universal method is available to purify protein-RNA complexes. Here, we introduce XRNAX, a method for the generic purification of protein-crosslinked RNA, and demonstrate its versatility to study the composition and dynamics of protein-RNA interactions by various transcriptomic and proteomic approaches. We show that XRNAX captures all RNA biotypes and use this to characterize the sub-proteomes that interact with coding and non-coding RNAs (ncRNAs) and to identify hundreds of protein-RNA interfaces.
View Article and Find Full Text PDFCurr Protoc Bioinformatics
December 2017
Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.
View Article and Find Full Text PDFThe phagocyte respiratory burst is crucial for innate immunity. The transfer of electrons to oxygen is mediated by a membrane-bound heterodimer, comprising gp91 and p22 subunits. Deficiency of either subunit leads to severe immunodeficiency.
View Article and Find Full Text PDFBackground: Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements.
View Article and Find Full Text PDFNon-Watson-Crick pairs like the G·U wobble are frequent in RNA duplexes. Their geometric dissimilarity (nonisostericity) with the Watson-Crick base pairs and among themselves imparts structural variations decisive for biological functions. Through a novel circular representation of base pairs, a simple and general metric scheme for quantification of base-pair nonisostericity, in terms of residual twist and radial difference that can also envisage its mechanistic effect, is proposed.
View Article and Find Full Text PDFVibrio cholerae, the enteropathogenic gram negative bacteria is one of the main causative agents of waterborne diseases like cholera. About 1/3(rd) of the organism's genome is uncharacterised with many protein coding genes lacking structure and functional information. These proteins form significant fraction of the genome and are crucial in understanding the organism's complete functional makeup.
View Article and Find Full Text PDF