Publications by authors named "Shirley Pepke"

Differential gene expression analysis is an important technique for understanding disease states. The machine learning algorithm CorEx has shown utility in analyzing differential expression of groups of genes in tumor RNA-seq in a way that may be helpful for advancing precision oncology. However, CorEx produces many factors that can be challenging to analyze and connect to existing understanding.

View Article and Find Full Text PDF

Background: De novo inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. Current methods typically either partition patient samples into a few subtypes or rely upon analysis of pairwise gene correlations that will miss some groups in noisy data. Leveraging higher dimensional information can be expected to increase the power to discern targetable pathways, but this is commonly thought to be an intractable computational problem.

View Article and Find Full Text PDF

Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency.

View Article and Find Full Text PDF

We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions.

View Article and Find Full Text PDF

Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31).

View Article and Find Full Text PDF

During the acquisition of memories, influx of Ca2+ into the postsynaptic spine through the pores of activated N-methyl-D-aspartate-type glutamate receptors triggers processes that change the strength of excitatory synapses. The pattern of Ca2+influx during the first few seconds of activity is interpreted within the Ca2+-dependent signaling network such that synaptic strength is eventually either potentiated or depressed. Many of the critical signaling enzymes that control synaptic plasticity,including Ca2+/calmodulin-dependent protein kinase II (CaMKII), are regulated by calmodulin, a small protein that can bindup to 4 Ca2+ ions.

View Article and Find Full Text PDF

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects.

View Article and Find Full Text PDF

We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree.

View Article and Find Full Text PDF