Publications by authors named "Zain Patel"

We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome.

View Article and Find Full Text PDF

A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications.

View Article and Find Full Text PDF

Most of the human genome is thought to be non-functional, and includes large segments often referred to as "dark matter" DNA. The genome also encodes hundreds of putative and poorly characterized transcription factors (TFs). We determined genomic binding locations of 166 uncharacterized human TFs in living cells.

View Article and Find Full Text PDF

CRISPR tiling screens have advanced the identification and characterization of regulatory sequences but are limited by low resolution arising from the indirect readout of editing via guide RNA sequencing. This study introduces , an end-to-end experimental assay and computational pipeline, which leverages targeted sequencing of CRISPR-introduced alleles at the endogenous target locus following dense base-editing mutagenesis. This approach enables the dissection of regulatory elements at nucleotide resolution, facilitating a direct assessment of genotype-phenotype effects.

View Article and Find Full Text PDF

Microglia play diverse pathophysiological roles in Alzheimer's disease (AD), with genetic susceptibility factors skewing microglial cell function to influence AD risk. CD33 is an immunomodulatory receptor associated with AD susceptibility through a single nucleotide polymorphism that modulates mRNA splicing, skewing protein expression from a long protein isoform (CD33M) to a short isoform (CD33m). Understanding how human CD33 isoforms differentially impact microglial cell function in vivo has been challenging due to functional divergence of CD33 between mice and humans.

View Article and Find Full Text PDF

The challenge of systematically modifying and optimizing regulatory elements for precise gene expression control is central to modern genomics and synthetic biology. Advancements in generative AI have paved the way for designing synthetic sequences with the aim of safely and accurately modulating gene expression. We leverage diffusion models to design context-specific DNA regulatory sequences, which hold significant potential toward enabling novel therapeutic applications requiring precise modulation of gene expression.

View Article and Find Full Text PDF

Spatially resolved transcriptomics offers unprecedented insight by enabling the profiling of gene expression within the intact spatial context of cells, effectively adding a new and essential dimension to data interpretation. To efficiently detect spatial structure of interest, an essential step in analyzing such data involves identifying spatially variable genes. Despite researchers having developed several computational methods to accomplish this task, the lack of a comprehensive benchmark evaluating their performance remains a considerable gap in the field.

View Article and Find Full Text PDF

Cajal-Retzius (CR) cells are transient neurons with long-lasting effects on the architecture and circuitry of the neocortex and hippocampus. Contrary to the prevailing assumption that CR cells completely disappear in rodents shortly after birth, a substantial portion of these cells persist in the hippocampus throughout adulthood. The role of these surviving CR cells in the adult hippocampus is largely unknown, partly because of the paucity of suitable tools to dissect their functions in the adult versus the embryonic brain.

View Article and Find Full Text PDF

Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily.

View Article and Find Full Text PDF

Heterozygous pathogenic variants in CIC, which encodes a transcriptional repressor, have been identified in individuals with neurodevelopmental phenotypes. To date, 11 CIC variants have been associated with the CIC-related neurodevelopmental syndrome. Here, we describe three novel and one previously reported CIC variants in four individuals with neurodevelopmental delay.

View Article and Find Full Text PDF

Background: Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described.

View Article and Find Full Text PDF