Publications by authors named "Kulakovskiy I"

We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome.

View Article and Find Full Text PDF

A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications.

View Article and Find Full Text PDF

Transcription factors (TFs) are key players in eukaryotic gene regulation, but the DNA binding specificity of many TFs remains unknown. Here, we assayed 284 mostly poorly characterized, putative human TFs using selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq), revealing 72 new DNA binding motifs. To investigate whether some of the 158 TFs for which we did not find motifs preferably bind epigenetically modified DNA (i.

View Article and Find Full Text PDF

A long-standing challenge in human regulatory genomics is that transcription factor (TF) DNA-binding motifs are short and degenerate, while the genome is large. Motif scans therefore produce many false-positive binding site predictions. By surveying 179 TFs across 25 families using >1,500 cyclic selection experiments with fragmented, naked, and unmodified genomic DNA - a method we term GHT-SELEX (Genomic HT-SELEX) - we find that many human TFs possess much higher sequence specificity than anticipated.

View Article and Find Full Text PDF

Most of the human genome is thought to be non-functional, and includes large segments often referred to as "dark matter" DNA. The genome also encodes hundreds of putative and poorly characterized transcription factors (TFs). We determined genomic binding locations of 166 uncharacterized human TFs in living cells.

View Article and Find Full Text PDF

Inorganic polyphosphates and respective metabolic pathways and enzymes are important factors for yeast active growth in unfavorable conditions. However, particular proteins of polyphosphate metabolism remain poorly explored in this context. Here we report biochemical and transcriptomic characterization of the CRN/PPN2 yeast strain (derived from Ppn1-lacking CRN strain) overexpressing poorly studied Ppn2 polyphosphatase.

View Article and Find Full Text PDF
Article Synopsis
  • A systematic evaluation is necessary to understand how different model architectures and training strategies affect the performance of genomics models, prompting the organization of a DREAM Challenge.
  • In the challenge, competitors used a vast dataset of yeast DNA sequences and expression levels to train models, with the best models employing various neural network architectures and training approaches.
  • The development of the Prix Fixe framework allowed for an in-depth analysis of these models, leading to improved performance, and demonstrating that top models not only excelled on yeast data but also outperformed existing benchmarks in Drosophila and human datasets.
View Article and Find Full Text PDF

In our cells, a limited number of RNA binding proteins (RBPs) are responsible for all aspects of RNA metabolism across the entire transcriptome. To accomplish this, RBPs form regulatory units that act on specific target regulons. However, the landscape of RBP combinatorial interactions remains poorly explored.

View Article and Find Full Text PDF

The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs.

View Article and Find Full Text PDF

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression.

View Article and Find Full Text PDF

Y-box-binding proteins (YB proteins) are multifunctional DNA- and RNA-binding proteins that play an important role in the regulation of gene expression. The high homology of their cold shock domains and the similarity between their long, unstructured C-terminal domains suggest that Y-box-binding proteins may have similar functions in a cell. Here, we consider the functional interchangeability of the somatic YB proteins YB-1 and YB-3.

View Article and Find Full Text PDF

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors.

View Article and Find Full Text PDF

Single-nucleotide polymorphism rs71327024 located in the human 3p21.31 locus has been associated with an elevated risk of hospitalization upon SARS-CoV-2 infection. The 3p21.

View Article and Find Full Text PDF

Motivation: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar.

Results: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.

View Article and Find Full Text PDF

While protein synthesis is vital for the majority of cell types of the human body, diversely differentiated cells require specific translation regulation. This suggests the specialization of translation machinery across tissues and organs. Using transcriptomic data from GTEx, FANTOM, and Gene Atlas, we systematically explored the abundance of transcripts encoding translation factors and aminoacyl-tRNA synthetases (ARSases) in human tissues.

View Article and Find Full Text PDF

The Polycomb group (PcG) proteins are fundamental epigenetic regulators that control the repressive state of target genes in multicellular organisms. One of the open questions is defining the mechanisms of PcG recruitment to chromatin. In Drosophila, the crucial role in PcG recruitment is thought to belong to DNA-binding proteins associated with Polycomb response elements (PREs).

View Article and Find Full Text PDF

A deeper knowledge of the dynamic transcriptional activity of promoters and enhancers is needed to improve mechanistic understanding of the pathogenesis of heart failure and heart diseases. In this study, we used cap analysis of gene expression (CAGE) to identify and quantify the activity of transcribed regulatory elements (TREs) in the four cardiac chambers of 21 healthy and ten failing adult human hearts. We identified 17,668 promoters and 14,920 enhancers associated with the expression of 14,519 genes.

View Article and Find Full Text PDF

We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org.

View Article and Find Full Text PDF

The position weight matrix, also called the position-specific scoring matrix, is the commonly accepted model to quantify the specificity of transcription factor binding to DNA. Position weight matrices are used in thousands of projects and software tools in regulatory genomics, including computational prediction of the regulatory impact of single-nucleotide variants. Yet, recently Yan et al.

View Article and Find Full Text PDF

N6-methyladenosine (m6A) is the most abundant, highly dynamic mRNA modification that regulates mRNA splicing, stability, and translation. The m6A epigenetic mark is erased by RNA demethylases ALKBH5 (AlkB Homolog 5) and FTO (Fat mass and obesity-associated protein). The ALKBH5 and FTO RNA demethylases recognize m6A in similar nucleotide contexts.

View Article and Find Full Text PDF

YB proteins are DNA/RNA binding proteins, members of the family of proteins with cold shock domain. Role of YB proteins in the life of cells, tissues, and whole organisms is extremely important. They are involved in transcription regulation, pre-mRNA splicing, mRNA translation and stability, mRNA packaging into mRNPs, including stress granules, DNA repair, and many other cellular events.

View Article and Find Full Text PDF

We present ANANASTRA, https://ananastra.autosome.org, a web server for the identification and annotation of regulatory single-nucleotide polymorphisms (SNPs) with allele-specific binding events.

View Article and Find Full Text PDF

eIF4G2 (DAP5 or Nat1) is a homologue of the canonical translation initiation factor eIF4G1 in higher eukaryotes but its function remains poorly understood. Unlike eIF4G1, eIF4G2 does not interact with the cap-binding protein eIF4E and is believed to drive translation under stress when eIF4E activity is impaired. Here, we show that eIF4G2 operates under normal conditions as well and promotes scanning downstream of the eIF4G1-mediated 40S recruitment and cap-proximal scanning.

View Article and Find Full Text PDF

In eukaryotes, stalled and collided ribosomes are recognized by several conserved multicomponent systems, which either block protein synthesis and resolve the collision locally, or trigger a general stress response. Yeast ribosome-binding GTPases RBG1 (DRG1 in mammals) and RBG2 (DRG2) form two distinct heterodimers with TMA46 (DFRP1) and GIR2 (DFRP2), respectively, both involved in mRNA translation. Accumulated evidence suggests that the dimers play partially redundant roles in elongation processivity and resolution of ribosome stalling and collision events, as well as in the regulation of GCN1-mediated signaling involved in ribosome-associated quality control (RQC).

View Article and Find Full Text PDF