High-throughput phenotypic screens using biochemical perturbations and high-content readouts are constrained by limitations of scale. To address this, we establish a method of pooling exogenous perturbations followed by computational deconvolution to reduce required sample size, labor and cost. We demonstrate the increased efficiency of compressed experimental designs compared to conventional approaches through benchmarking with a bioactive small-molecule library and a high-content imaging readout.
View Article and Find Full Text PDFIn this paper, we aim to build a platform that will help bridge the gap between high-dimensional computation and wet-lab experimentation by allowing users to interrogate genomic signatures at multiple molecular levels and identify best next actionable steps for downstream decision making. We introduce Multioviz: a publicly accessible R package and web application platform to easily perform in silico hypothesis testing of generated gene regulatory networks. We demonstrate the utility of Multioviz by conducting an end-to-end analysis in a statistical genetics application focused on measuring the effect of in silico perturbations of complex trait architecture.
View Article and Find Full Text PDFEfforts to cure BCR::ABL1 B cell acute lymphoblastic leukemia (Ph+ ALL) solely through inhibition of ABL1 kinase activity have thus far been insufficient despite the availability of tyrosine kinase inhibitors (TKIs) with broad activity against resistance mutants. The mechanisms that drive persistence within minimal residual disease (MRD) remain poorly understood and therefore untargeted. Utilizing 13 patient-derived xenograft (PDX) models and clinical trial specimens of Ph+ ALL, we examined how genetic and transcriptional features co-evolve to drive progression during prolonged TKI response.
View Article and Find Full Text PDFLD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a -interaction score (i.
View Article and Find Full Text PDFClustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates.
View Article and Find Full Text PDFUnderstanding morphological variation is an important task in many areas of computational biology. Recent studies have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets.
View Article and Find Full Text PDFEpistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping.
View Article and Find Full Text PDFNatural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered.
View Article and Find Full Text PDFIn this paper, we propose a new approach for variable selection using a collection of Bayesian neural networks with a focus on quantifying uncertainty over which variables are selected. Motivated by fine-mapping applications in statistical genetics, we refer to our framework as an "ensemble of single-effect neural networks" (ESNN) which generalizes the "sum of single effects" regression framework by both accounting for nonlinear structure in genotypic data (e.g.
View Article and Find Full Text PDFIdentifying structural differences among proteins can be a non-trivial task. When contrasting ensembles of protein structures obtained from molecular dynamics simulations, biologically-relevant features can be easily overshadowed by spurious fluctuations. Here, we present SINATRA Pro, a computational pipeline designed to robustly identify topological differences between two sets of protein structures.
View Article and Find Full Text PDFSince 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals.
View Article and Find Full Text PDFPrognostically relevant RNA expression states exist in pancreatic ductal adenocarcinoma (PDAC), but our understanding of their drivers, stability, and relationship to therapeutic response is limited. To examine these attributes systematically, we profiled metastatic biopsies and matched organoid models at single-cell resolution. In vivo, we identify a new intermediate PDAC transcriptional cell state and uncover distinct site- and state-specific tumor microenvironments (TMEs).
View Article and Find Full Text PDFIn this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets.
View Article and Find Full Text PDFLarge-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits.
View Article and Find Full Text PDFThe winged insects of the order Diptera are colloquially named for their most recognizable phenotype: flight. These insects rely on flight for a number of important life history traits, such as dispersal, foraging, and courtship. Despite the importance of flight, relatively little is known about the genetic architecture of flight performance.
View Article and Find Full Text PDFGlycogen synthase kinase-3β (GSK-3β), a serine/threonine kinase, has been implicated in the pathogenesis of many cancers, with involvement in cell-cycle regulation, apoptosis, and immune response. Small-molecule GSK-3β inhibitors are currently undergoing clinical investigation. Tumor sequencing has revealed genomic alterations in , yet an assessment of the genomic landscape in malignancies is lacking.
View Article and Find Full Text PDFThe central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data.
View Article and Find Full Text PDFTraditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits.
View Article and Find Full Text PDFHere we demonstrate a technique to generate proteomic signatures of specific cell types within heterogeneous populations. While our method is broadly applicable across biological systems, we have limited the current work to study neural cell types isolated from human, post-mortem Alzheimer's disease (AD) and aged-matched non-symptomatic (NS) brains. Motivating the need for this tool, we conducted an initial meta-analysis of current, human AD proteomics studies.
View Article and Find Full Text PDFAm J Physiol Cell Physiol
August 2019
Many different subpopulations of subcellular extracellular vesicles (EVs) have been described. EVs are released from all cell types and have been shown to regulate normal physiological homeostasis, as well as pathological states by influencing cell proliferation, differentiation, organ homing, injury and recovery, as well as disease progression. In this review, we focus on the bidirectional actions of vesicles from normal and diseased cells on normal or leukemic target cells; and on the leukemic microenvironment as a whole.
View Article and Find Full Text PDFNonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariant - for example, the Gaussian kernel.
View Article and Find Full Text PDFLinear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.
View Article and Find Full Text PDF