Publications by authors named "Ilya Vorontsov"

We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome.

View Article and Find Full Text PDF

A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications.

View Article and Find Full Text PDF

Most of the human genome is thought to be non-functional, and includes large segments often referred to as "dark matter" DNA. The genome also encodes hundreds of putative and poorly characterized transcription factors (TFs). We determined genomic binding locations of 166 uncharacterized human TFs in living cells.

View Article and Find Full Text PDF

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors.

View Article and Find Full Text PDF

We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org.

View Article and Find Full Text PDF

We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events.

View Article and Find Full Text PDF

Somatic mutations in regulatory sites of human stem cells affect cell identity or cause malignant transformation. By mining the human genome for co-occurrence of mutations and transcription factor binding sites, we show that C/EBP binding sites are strongly enriched with [C > T]G mutations in cancer and adult stem cells, which is of special interest because C/EBPs regulate cell fate and differentiation. In vitro protein-DNA binding assay and structural modeling of the CEBPB-DNA complex show that the G·T mismatch in the core CG dinucleotide strongly enhances affinity of the binding site.

View Article and Find Full Text PDF

Sequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants.

View Article and Find Full Text PDF

During translation, the rate of ribosome movement along mRNA varies. This leads to a non-uniform ribosome distribution along the transcript, depending on local mRNA sequence, structure, tRNA availability, and translation factor abundance, as well as the relationship between the overall rates of initiation, elongation, and termination. Stress, antibiotics, and genetic perturbations affecting composition and properties of translation machinery can alter the ribosome positional distribution dramatically.

View Article and Find Full Text PDF

Background: Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype.

Results: Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome.

View Article and Find Full Text PDF
Article Synopsis
  • Long noncoding RNAs (lncRNAs) make up most of transcripts in mammalian genomes, but their functions are still not well understood.
  • The FANTOM6 project systematically knocked down 285 lncRNAs in human dermal fibroblasts and analyzed changes in cell growth, shape, and gene expression using CAGE techniques.
  • This study provides a comprehensive lncRNA knockdown data set (over 1000 CAGE sequencing libraries) and reveals important findings about their roles and impact on various cellular pathways.
View Article and Find Full Text PDF

Background: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

View Article and Find Full Text PDF

Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants.

View Article and Find Full Text PDF

Objectives: Mammalian genomics studies, especially those focusing on transcriptional regulation, require information on genomic locations of regulatory regions, particularly, transcription factor (TF) binding sites. There are plenty of published ChIP-Seq data on in vivo binding of transcription factors in different cell types and conditions. However, handling of thousands of separate data sets is often impractical and it is desirable to have a single global map of genomic regions potentially bound by a particular TF in any of studied cell types and conditions.

View Article and Find Full Text PDF

We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database.

View Article and Find Full Text PDF

We studied functional effect of rs12722489 single nucleotide polymorphism located in the first intron of human IL2RA gene on transcriptional regulation. This polymorphism is associated with multiple autoimmune conditions (rheumatoid arthritis, multiple sclerosis, Crohn's disease, and ulcerative colitis). Analysis in silico suggested significant difference in the affinity of estrogen receptor (ER) binding site between alternative allelic variants, with stronger predicted affinity for the risk (G) allele.

View Article and Find Full Text PDF

IL2RA gene encodes the alpha subunit of a high-affinity receptor for interleukin-2 which is expressed by several distinct populations of lymphocytes involved in autoimmune processes. A large number of polymorphic alleles of the IL2RA locus are associated with the development of various autoimmune diseases. With bioinformatics analysis we the dissected the first intron of the IL2RA gene and selected several single nucleotide polymorphisms (SNPs) that may influence the regulation of the IL2RA gene in cell types relevant to autoimmune pathology.

View Article and Find Full Text PDF

Signaling lymphocytic activation molecule family member 1 (SLAMF1)/CD150 is a co-stimulatory receptor expressed on a variety of hematopoietic cells, in particular on mature lymphocytes activated by specific antigen, costimulation and cytokines. Changes in CD150 expression level have been reported in association with autoimmunity and with B-cell chronic lymphocytic leukemia. We characterized the core promoter for SLAMF1 gene in human B-cell lines and explored binding sites for a number of transcription factors involved in B cell differentiation and activation.

View Article and Find Full Text PDF

Background: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and, consequently, expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer.

View Article and Find Full Text PDF

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.

View Article and Find Full Text PDF

Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes.

View Article and Find Full Text PDF

Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles.

View Article and Find Full Text PDF

Background: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs.

View Article and Find Full Text PDF

Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution.

View Article and Find Full Text PDF