Publications by Sean Whalen | LitMetric

Publications by authors named "Sean Whalen"

Page 1 of 2

Massively parallel characterization of regulatory elements in the developing human cortex.

Chengyu Deng Sean Whalen Marilyn Steyert Ryan Ziffra Pawel F Przytycki

Science

May 2024

Nucleotide changes in gene regulatory elements are important determinants of neuronal development and diseases. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 open chromatin regions, including thousands of sequences with cell type-specific accessibility and variants associated with brain gene regulation. In primary cells, we identified 46,802 active enhancer sequences and 164 variants that alter enhancer activity.

View Article and Find Full Text PDF

Three-dimensional genome rewiring in loci with human accelerated regions.

Kathleen C Keough Sean Whalen Fumitaka Inoue Pawel F Przytycki Tyler Fair

Science

April 2023

Human accelerated regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with an automated pipeline and an alignment of 241 mammalian genomes. Combining deep learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains containing human-specific genomic variants that change three-dimensional (3D) genome organization.

View Article and Find Full Text PDF

Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex.

Chengyu Deng Sean Whalen Marilyn Steyert Ryan Ziffra Pawel F Przytycki

bioRxiv

February 2023

Nucleotide changes in gene regulatory elements are important determinants of neuronal development and disease. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the -regulatory activity of 102,767 sequences, including differentially accessible cell-type specific regions in the developing cortex and single-nucleotide variants associated with psychiatric disorders. In primary cells, we identified 46,802 active enhancer sequences and 164 disorder-associated variants that significantly alter enhancer activity.

View Article and Find Full Text PDF

An atlas of lamina-associated chromatin across twelve human cell types reveals an intermediate chromatin subtype.

Parisha P Shah Kathleen C Keough Ketrin Gjoni Garrett T Santini Richard J Abdill Sean Whalen

Genome Biol

January 2023

Background: Association of chromatin with lamin proteins at the nuclear periphery has emerged as a potential mechanism to coordinate cell type-specific gene expression and maintain cellular identity via gene silencing. Unlike many histone modifications and chromatin-associated proteins, lamina-associated domains (LADs) are mapped genome-wide in relatively few genetically normal human cell types, which limits our understanding of the role peripheral chromatin plays in development and disease.

Results: To address this gap, we map LAMIN B1 occupancy across twelve human cell types encompassing pluripotent stem cells, intermediate progenitors, and differentiated cells from all three germ layers.

View Article and Find Full Text PDF

Machine learning dissection of human accelerated regions in primate neurodevelopment.

Sean Whalen Fumitaka Inoue Hane Ryu Tyler Fair Eirene Markenscoff-Papadimitriou

Neuron

March 2023

Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 human accelerated regions (HARs), finding 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in chimpanzee and human neural progenitor cells. The species-specific enhancer activity of HARs was accurately predicted from the presence and absence of transcription factor footprints in each species.

View Article and Find Full Text PDF

Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity.

Ivan Carcamo-Orive Gabriel E Hoffman Paige Cundiff Noam D Beckmann Sunita L D'Souza Sean Whalen

Cell Stem Cell

October 2022

View Article and Find Full Text PDF

Enhancer Function and Evolutionary Roles of Human Accelerated Regions.

Sean Whalen Katherine S Pollard

Annu Rev Genet

November 2022

Human accelerated regions (HARs) are the fastest-evolving sequences in the human genome. When HARs were discovered in 2006, their function was mysterious due to scant annotation of the noncoding genome. Diverse technologies, from transgenic animals to machine learning, have consistently shown that HARs function as gene regulatory enhancers with significant enrichment in neurodevelopment.

View Article and Find Full Text PDF

Autism risk gene POGZ promotes chromatin accessibility and expression of clustered synaptic genes.

Eirene Markenscoff-Papadimitriou Fadya Binyameen Sean Whalen James Price Kenneth Lim

Cell Rep

December 2021

Deleterious genetic variants in POGZ, which encodes the chromatin regulator Pogo Transposable Element with ZNF Domain protein, are strongly associated with autism spectrum disorder (ASD). Although it is a high-confidence ASD risk gene, the neurodevelopmental functions of POGZ remain unclear. Here we reveal the genomic binding of POGZ in the developing forebrain at euchromatic loci and gene regulatory elements (REs).

View Article and Find Full Text PDF

Navigating the pitfalls of applying machine learning in genomics.

Sean Whalen Jacob Schreiber William S Noble Katherine S Pollard

Nat Rev Genet

March 2022

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics.

View Article and Find Full Text PDF

Author Correction: lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.

M Grace Gordon Fumitaka Inoue Beth Martin Max Schubach Vikram Agarwal Sean Whalen

Nat Protoc

July 2021

View Article and Find Full Text PDF

Association of P-Wave Axis With Incident Atrial Fibrillation in Diabetes Mellitus (from the ACCORD Trial).

Karanpreet K Dhaliwal Bharathi Upadhya Elsayed Z Soliman Elijah H Beaty Joseph Yeboah Sean P Whalen

Am J Cardiol

August 2020

Abnormal P-wave axis may reflect preclinical atrial dysfunction and has been associated with an increased risk of incident atrial fibrillation (AF) in the general population. Patients with diabetes mellitus (DM) have a higher prevalence of AF, but the association of abnormal P-wave axis and the risk of incident AF in those with diabetes has not been previously explored. For this analysis, we included 8,965 eligible participants from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial.

View Article and Find Full Text PDF

lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements.

M Grace Gordon Fumitaka Inoue Beth Martin Max Schubach Vikram Agarwal Sean Whalen

Nat Protoc

August 2020

Massively parallel reporter assays (MPRAs) can simultaneously measure the function of thousands of candidate regulatory sequences (CRSs) in a quantitative manner. In this method, CRSs are cloned upstream of a minimal promoter and reporter gene, alongside a unique barcode, and introduced into cells. If the CRS is a functional regulatory element, it will lead to the transcription of the barcode sequence, which is measured via RNA sequencing and normalized for cellular integration via DNA sequencing of the barcode.

View Article and Find Full Text PDF

A Chromatin Accessibility Atlas of the Developing Human Telencephalon.

Eirene Markenscoff-Papadimitriou Sean Whalen Pawel Przytycki Reuben Thomas Fadya Binyameen

Cell

August 2020

To discover regulatory elements driving the specificity of gene expression in different cell types and regions of the developing human brain, we generated an atlas of open chromatin from nine dissected regions of the mid-gestation human telencephalon, as well as microdissected upper and deep layers of the prefrontal cortex. We identified a subset of open chromatin regions (OCRs), termed predicted regulatory elements (pREs), that are likely to function as developmental brain enhancers. pREs showed temporal, regional, and laminar differences in chromatin accessibility and were correlated with gene expression differences across regions and gestational ages.

View Article and Find Full Text PDF

AlleleAnalyzer: a tool for personalized and allele-specific sgRNA design.

Kathleen C Keough Svetlana Lyalina Michael P Olvera Sean Whalen Bruce R Conklin

Genome Biol

August 2019

The CRISPR/Cas system is a highly specific genome editing tool capable of distinguishing alleles differing by even a single base pair. Target sites might carry genetic variations that are not distinguishable by sgRNA designing tools based on one reference genome. AlleleAnalyzer is an open-source software that incorporates single-nucleotide variants and short insertions and deletions to design sgRNAs for precisely editing 1 or multiple haplotypes of a sequenced genome, currently supporting 11 Cas proteins.

View Article and Find Full Text PDF

Reply to 'Inflated performance measures in enhancer-promoter interaction-prediction methods'.

Sean Whalen Katherine S Pollard

Nat Genet

August 2019

View Article and Find Full Text PDF

The glycan CA19-9 promotes pancreatitis and pancreatic cancer in mice.

Dannielle D Engle Hervé Tiriac Keith D Rivera Arnaud Pommier Sean Whalen

Science

June 2019

Glycosylation alterations are indicative of tissue inflammation and neoplasia, but whether these alterations contribute to disease pathogenesis is largely unknown. To study the role of glycan changes in pancreatic disease, we inducibly expressed human fucosyltransferase 3 and β1,3-galactosyltransferase 5 in mice, reconstituting the glycan sialyl-Lewis, also known as carbohydrate antigen 19-9 (CA19-9). Notably, CA19-9 expression in mice resulted in rapid and severe pancreatitis with hyperactivation of epidermal growth factor receptor (EGFR) signaling.

View Article and Find Full Text PDF

Most chromatin interactions are not in linkage disequilibrium.

Sean Whalen Katherine S Pollard

Genome Res

March 2019

Chromatin interactions and linkage disequilibrium (LD) are both pairwise measurements between genomic loci that show block patterns along mammalian chromosomes. Their values are generally high for sites that are nearby in the linear genome but abruptly drop across block boundaries. One function of chromatin boundaries is to insulate regulatory domains from one another.

View Article and Find Full Text PDF

The Epstein-Barr Virus Episome Maneuvers between Nuclear Chromatin Compartments during Reactivation.

Stephanie A Moquin Sean Thomas Sean Whalen Alix Warburton Samantha G Fernandez

J Virol

February 2018

Article Synopsis

* During latency, EBV associates with repressive areas of the nucleus, but upon reactivation, it moves towards active regions, suggesting a transformation in its interaction with the nuclear environment.
* This study highlights the role of spatial organization in gene regulation, indicating that long-range associations between chromosomes could be crucial for transcriptional activity.

View Article and Find Full Text PDF

Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk.

Felix R Day Deborah J Thompson Hannes Helgason Daniel I Chasman Hilary Finucane Sean Whalen

Nat Genet

June 2017

Article Synopsis

- The research identifies 389 genetic signals related to the timing of menarche (first menstrual period) in up to 370,000 women, showing how genetics influence this aspect of puberty and link it to adult diseases.
- Findings indicate that about 7.4% of the population variation in menarche age can be explained by these genetic signals, with a notable enrichment of associated genes in neural tissues.
- The study suggests that the timing of puberty has causal relationships with certain cancers, independently of factors like body mass index (BMI), highlighting the intricate genetic factors influencing puberty and its long-term health effects.

View Article and Find Full Text PDF

Unboxing cluster heatmaps.

Sophie Engle Sean Whalen Alark Joshi Katherine S Pollard

BMC Bioinformatics

February 2017

Background: Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. This visualization technique has high data density and reveal clusters better than unordered heatmaps alone. However, cluster heatmaps have known issues making them both time consuming to use and prone to error.

View Article and Find Full Text PDF

Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity.

Ivan Carcamo-Orive Gabriel E Hoffman Paige Cundiff Noam D Beckmann Sunita L D'Souza Sean Whalen

Cell Stem Cell

April 2017

Variability in induced pluripotent stem cell (iPSC) lines remains a concern for disease modeling and regenerative medicine. We have used RNA-sequencing analysis and linear mixed models to examine the sources of gene expression variability in 317 human iPSC lines from 101 individuals. We found that ∼50% of genome-wide expression variability is explained by variation across individuals and identified a set of expression quantitative trait loci that contribute to this variation.

View Article and Find Full Text PDF

Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin.

Sean Whalen Rebecca M Truty Katherine S Pollard

Nat Genet

May 2016

Discriminating the gene target of a distal regulatory element from other nearby transcribed genes is a challenging problem with the potential to illuminate the causal underpinnings of complex diseases. We present TargetFinder, a computational method that reconstructs regulatory landscapes from diverse features along the genome. The resulting models accurately predict individual enhancer-promoter interactions across multiple cell lines with a false discovery rate up to 15 times smaller than that obtained using the closest gene.

View Article and Find Full Text PDF

Predicting protein function and other biomedical characteristics with heterogeneous ensembles.

Sean Whalen Om Prakash Pandey Gaurav Pandey

Methods

January 2016

Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets.

View Article and Find Full Text PDF

Enhancing the functional content of eukaryotic protein interaction networks.

Gaurav Pandey Sonali Arora Sahil Manocha Sean Whalen

PLoS One

December 2015

Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, these networks face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we apply a robust measure of local network structure called common neighborhood similarity (CNS) to address these challenges.

View Article and Find Full Text PDF

Structural drift: the population dynamics of sequential learning.

James P Crutchfield Sean Whalen

PLoS Comput Biol

October 2012

We introduce a theory of sequential causal inference in which learners in a chain estimate a structural model from their upstream "teacher" and then pass samples from the model to their downstream "student". It extends the population dynamics of genetic drift, recasting Kimura's selectively neutral theory as a special case of a generalized drift process using structured populations with memory. We examine the diffusion and fixation properties of several drift processes and propose applications to learning, inference, and evolution.

View Article and Find Full Text PDF