Characterizing the binding preferences of transcription factors (TFs) in different cell types and conditions is key to understand how they orchestrate gene expression. Here, we develop TFscope, a machine learning approach that identifies sequence features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two conditions or two TFs with similar motifs (paralogous TFs). TFscope systematically investigates differences in the core motif, nucleotide environment and co-factor motifs, and provides the contribution of each key feature in the two experiments.
View Article and Find Full Text PDFMotivations: Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes.
View Article and Find Full Text PDFThe elevation of atmospheric CO leads to a decline in plant mineral content, which might pose a significant threat to food security in coming decades. Although few genes have been identified for the negative effect of elevated CO on plant mineral composition, several studies suggest the existence of genetic factors. Here, we performed a large-scale study to explore genetic diversity of plant ionome responses to elevated CO, using six hundred accessions, representing geographical distributions ranging from worldwide to regional and local environments.
View Article and Find Full Text PDFThe elevation of CO in the atmosphere increases plant biomass but decreases their mineral content. The genetic and molecular bases of these effects remain mostly unknown, in particular in the root system, which is responsible for plant nutrient uptake. To gain knowledge about the effect of elevated CO on plant growth and physiology, and to identify its regulatory in the roots, we analyzed genome expression in Arabidopsis roots through a combinatorial design with contrasted levels of CO , nitrate, and iron.
View Article and Find Full Text PDFBackground: High-throughput transcriptomic datasets are often examined to discover new actors and regulators of a biological response. To this end, graphical interfaces have been developed and allow a broad range of users to conduct standard analyses from RNA-seq data, even with little programming experience. Although existing solutions usually provide adequate procedures for normalization, exploration or differential expression, more advanced features, such as gene clustering or regulatory network inference, often miss or do not reflect current state of the art methodologies.
View Article and Find Full Text PDFLong regulatory elements (LREs), such as CpG islands, polydA:dT tracts or AU-rich elements, are thought to play key roles in gene regulation but, as opposed to conventional binding sites of transcription factors, few methods have been proposed to formally and automatically characterize them. We present here a computational approach named DExTER (Domain Exploration To Explain gene Regulation) dedicated to the identification of candidate LREs (cLREs) and apply it to the analysis of the genomes of P. falciparum and other eukaryotes.
View Article and Find Full Text PDFBackground: In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combinations among different gene classes, regulatory regions and cell types.
Results: We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a target TF in a particular cell type.
Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine.
View Article and Find Full Text PDFOverlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory.
View Article and Find Full Text PDFWe propose here the GETEC (Genome Evolution by Transformation, Expansion and Contraction) model of gene evolution based on substitution, insertion and deletion of genetic motifs. The GETEC model unifies two classes of evolution models: models of substitution, insertion and deletion of nucleotides as function of time (Lèbre and Michel, 2010) and sequence length (Lèbre and Michel, 2012), and models of symmetric substitution of genetic motifs as function of time (Benard and Michel, 2011). Evolution of genetic motifs based on substitution, insertion and deletion is modeled by a differential equation whose analytical solutions give an expression of the genetic motif occurrence probabilities as a function of time or sequence length, as well as in direct time direction (past-present) or inverse time direction (present-past).
View Article and Find Full Text PDFWe recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax.
View Article and Find Full Text PDFWe introduce here a gene evolution model which is an extension of the time-continuous stochastic IDIS model (Lèbre and Michel in J. Comput. Biol.
View Article and Find Full Text PDFDynamic Bayesian networks (DBNs) have received increasing attention from the computational biology community as models of gene regulatory networks. However, conventional DBNs are based on the homogeneous Markov assumption and cannot deal with inhomogeneity and nonstationarity in temporal processes. The present chapter provides a detailed discussion of how the homogeneity assumption can be relaxed.
View Article and Find Full Text PDFComput Biol Chem
December 2010
We develop here a new class of stochastic models of gene evolution based on residue Insertion-Deletion Independent from Substitution (IDIS). Indeed, in contrast to all existing evolution models, insertions and deletions are modeled here by a concept in population dynamics. Therefore, they are not only independent from each other, but also independent from the substitution process.
View Article and Find Full Text PDFBackground: Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions.
Methods: To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data.
Background: Nuclear workers from French contracting companies have received higher doses than workers from Electricité de France (EDF) or Commissariat à l'Energie Atomique (CEA).
Methods: A cohort study of 9,815 workers in 11 contracting companies, monitored for exposure to ionizing radiation between 1967 and 2000 were followed up for a median duration of 12.5 years.
Stat Appl Genet Mol Biol
March 2009
In this paper, we introduce a novel inference method for dynamic genetic networks which makes it possible to face a number of time measurements n that is much smaller than the number of genes p. The approach is based on the concept of a low order conditional dependence graph that we extend here in the case of dynamic Bayesian networks. Most of our results are based on the theory of graphical models associated with the directed acyclic graphs (DAGs).
View Article and Find Full Text PDF