Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising.
View Article and Find Full Text PDFThe biological process of RNA translation is fundamental to cellular life and has wide-ranging implications for human disease. Yet, accurately delineating the variation in RNA translation represents a significant challenge. Here, we develop RiboTIE, a transformer model-based approach to map global RNA translation.
View Article and Find Full Text PDFThe correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by experiments.
View Article and Find Full Text PDFTranscriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism using tandem mass spectrometry Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of .
View Article and Find Full Text PDFMotivation: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes.
Results: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention.
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways.
View Article and Find Full Text PDFThe effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models.
View Article and Find Full Text PDFProteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MSPIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses.
View Article and Find Full Text PDFTet-enzyme-mediated 5-hydroxymethylation of cytosines in DNA plays a crucial role in mouse embryonic stem cells (ESCs). In RNA also, 5-hydroxymethylcytosine (5hmC) has recently been evidenced, but its physiological roles are still largely unknown. Here we show the contribution and function of this mark in mouse ESCs and differentiating embryoid bodies.
View Article and Find Full Text PDFThe emergence of small open reading frame (sORF)-encoded peptides (SEPs) is rapidly expanding the known proteome at the lower end of the size distribution. Here, we show that the mitochondrial proteome, particularly the respiratory chain, is enriched for small proteins. Using a prediction and validation pipeline for SEPs, we report the discovery of 16 endogenous nuclear encoded, mitochondrial-localized SEPs (mito-SEPs).
View Article and Find Full Text PDFGrowing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields.
View Article and Find Full Text PDFNeuropeptides are a class of bioactive peptides shown to be involved in various physiological processes, including metabolism, development, and reproduction. Although neuropeptide candidates have been predicted from genomic and transcriptomic data, comprehensive characterization of neuropeptide repertoires remains a challenge owing to their small size and variable sequences. De novo prediction of neuropeptides from genome or transcriptome data is difficult and usually only efficient for those peptides that have identified orthologs in other animal species.
View Article and Find Full Text PDFThe increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing.
View Article and Find Full Text PDFJ Chromatogr B Analyt Technol Biomed Life Sci
August 2019
On average a human cell type expresses around 10,000 different protein coding genes synthesizing all the different molecular forms of the protein product (proteoforms) found in a cell. In a typical shotgun bottom up proteomic approach, the proteins are enzymatically cleaved, producing several 100,000 s of different peptides that are analyzed with liquid chromatography-tandem mass spectrometry (LC-MSMS). One of the major consequences of this high sample complexity is that coelution of peptides cannot be avoided.
View Article and Find Full Text PDFMass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF).
View Article and Find Full Text PDFPROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline.
View Article and Find Full Text PDFBrain derived peptides function as signaling molecules in the brain and regulate various physiological and behavioral processes. The low abundance and atypical fragmentation of these brain derived peptides makes detection using traditional proteomic methods challenging. In this study, we introduce and validate a new methodology for the discovery of novel peptides derived from mammalian brain.
View Article and Find Full Text PDFAnnotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes.
View Article and Find Full Text PDFRibosome profiling involves sequencing of approximately 30-base-long stretches of ribosome-protected mRNA. The technique enables genome-wide mapping of RNA undergoing active translation. Numerous small open reading frames have been identified by using ribosome profiling, leading researchers to question the assumed non-functional character of sORFs and to the identification of various important sORF translation products.
View Article and Find Full Text PDFComput Methods Programs Biomed
November 2019
Background And Objective: Ribosome profiling is a recent next generation sequencing technique enabling the genome-wide study of gene expression in biomedical research at the translation level. Too often, researchers precipitously start trying to test their hypotheses after alignment of their data, without checking the quality and the general features of their mapped data. Despite the fact that these checks are essential to prevent errors and ensure valid conclusions afterwards, easy-to-use tools for visualizing the quality and overall outlook of mapped ribosome profiling data are lacking.
View Article and Find Full Text PDFDeletion of chromosome 6q is a well-recognized abnormality found in poor-prognosis T-cell acute lymphoblastic leukemia (T-ALL). Using integrated genomic approaches, we identified two candidate haploinsufficient genes contiguous at 6q14, (encoding hnRNP-Q) and (that hosts snoRNAs), both involved in regulating RNA maturation and translation. Combined silencing of both genes, but not of either gene alone, accelerated leukemogeneis in a -driven mouse model, demonstrating the tumor-suppressive nature of the two-gene region.
View Article and Find Full Text PDF