Motivation: Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes.
View Article and Find Full Text PDFThe black-box nature of most artificial intelligence (AI) models encourages the development of explainability methods to engender trust into the AI decision-making process. Such methods can be broadly categorized into two main types: post hoc explanations and inherently interpretable algorithms. We aimed at analyzing the possible associations between COVID-19 and the push of explainable AI (XAI) to the forefront of biomedical research.
View Article and Find Full Text PDFPurpose: Non-small-cell lung cancer (NSCLC) shows a high incidence of brain metastases (BM). Early detection is crucial to improve clinical prospects. We trained and validated classifier models to identify patients with a high risk of developing BM, as they could potentially benefit from surveillance brain MRI.
View Article and Find Full Text PDFEpigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease.
View Article and Find Full Text PDFBackground & Aims: Patient-derived organoid cancer models are generated from epithelial tumor cells and reflect tumor characteristics. However, they lack the complexity of the tumor microenvironment, which is a key driver of tumorigenesis and therapy response. Here, we developed a colorectal cancer organoid model that incorporates matched epithelial cells and stromal fibroblasts.
View Article and Find Full Text PDFThe ARID1A subunit of SWI/SNF chromatin remodeling complexes is a potent tumor suppressor. Here, a degron is applied to detect rapid loss of chromatin accessibility at thousands of loci where ARID1A acts to generate accessible minidomains of nucleosomes. Loss of ARID1A also results in the redistribution of the coactivator EP300.
View Article and Find Full Text PDFMalignant transformation depends on genetic and epigenetic events that result in a burst of deregulated gene expression and chromatin changes. To dissect the sequence of events in this process, we used a T-cell-specific lymphoma model based on the human oncogenic nucleophosmin-anaplastic lymphoma kinase (NPM-ALK) translocation. We find that transformation of T cells shifts thymic cell populations to an undifferentiated immunophenotype, which occurs only after a period of latency, accompanied by induction of the MYC-NOTCH1 axis and deregulation of key epigenetic enzymes.
View Article and Find Full Text PDFmRNA cap addition occurs early during RNA Pol II-dependent transcription, facilitating pre-mRNA processing and translation. We report that the mammalian mRNA cap methyltransferase, RNMT-RAM, promotes RNA Pol II transcription independent of mRNA capping and translation. In cells, sublethal suppression of RNMT-RAM reduces RNA Pol II occupancy, net mRNA synthesis, and pre-mRNA levels.
View Article and Find Full Text PDFMutations in the gene encoding the methyl-CG binding protein MeCP2 cause several neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG) is the classical MeCP2 DNA recognition sequence, but additional methylated sequence targets have been reported. Here we show by in vitro and in vivo analyses that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC.
View Article and Find Full Text PDFRNA-binding proteins play a key role in shaping gene expression profiles during stress, however, little is known about the dynamic nature of these interactions and how this influences the kinetics of gene expression. To address this, we developed kinetic cross-linking and analysis of cDNAs (χCRAC), an ultraviolet cross-linking method that enabled us to quantitatively measure the dynamics of protein-RNA interactions in vivo on a minute time-scale. Here, using χCRAC we measure the global RNA-binding dynamics of the yeast transcription termination factor Nab3 in response to glucose starvation.
View Article and Find Full Text PDFBackground: Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent.
View Article and Find Full Text PDFMotivation: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance. Although sequencing-based technologies are increasingly allowing high-resolution measurements of DNA methylation, statistical modelling of such data is still challenging. In particular, statistical identification of differentially methylated regions across different conditions poses unresolved challenges in accounting for spatial correlations within the statistical testing procedure.
View Article and Find Full Text PDFBackground: Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks.
View Article and Find Full Text PDFWe present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans.
View Article and Find Full Text PDFWe describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms.
View Article and Find Full Text PDFBackground: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.
Results: In this work we consider Support Vector Machines for splice site recognition.
The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence.
View Article and Find Full Text PDFWe obtained tomograms of isolated mammalian excitatory synapses by cryo-electron tomography. This method allows the investigation of biological material in the frozen-hydrated state, without staining, and can therefore provide reliable structural information at the molecular level. We developed an automated procedure for the segmentation of molecular complexes present in the synaptic cleft based on thresholding and connectivity, and calculated several morphological characteristics of these complexes.
View Article and Find Full Text PDF