Post-acute sequelae of SARS-CoV-2 (SARS2) infection (PASC) is a heterogeneous condition, but the main viral drivers are unknown. Here, we use MENSA, Media Enriched with Newly Synthesized Antibodies, secreted exclusively from circulating human plasmablasts, to provide an immune snapshot that defines the underlying viral triggers. We provide proof-of-concept testing that the MENSA technology can capture the new host immune response to accurately diagnose acute primary and breakthrough infections when known SARS2 virus or proteins are present.
View Article and Find Full Text PDFWhile immunologic correlates of COVID-19 have been widely reported, their associations with post-acute sequelae of COVID-19 (PASC) remain less clear. Due to the wide array of PASC presentations, understanding if specific disease features associate with discrete immune processes and therapeutic opportunities is important. Here we profile patients in the recovery phase of COVID-19 via proteomics screening and machine learning to find signatures of ongoing antiviral B cell development, immune-mediated fibrosis, and markers of cell death in PASC patients but not in controls with uncomplicated recovery.
View Article and Find Full Text PDFIn the brain, the complement system plays a crucial role in the immune response and in synaptic elimination during normal development and disease. Here, we sought to identify pathways that modulate the production of complement component 4 (C4), recently associated with an increased risk of schizophrenia. To design a disease-relevant assay, we first developed a rapid and robust 3D protocol capable of producing large numbers of astrocytes from pluripotent cells.
View Article and Find Full Text PDFMorphological and gene expression profiling can cost-effectively capture thousands of features in thousands of samples across perturbations by disease, mutation, or drug treatments, but it is unclear to what extent the two modalities capture overlapping versus complementary information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb human A549 lung cancer cells with 1,327 small molecules from the Drug Repurposing Hub across six doses, providing a data resource including dose-response data from both assays. The two assays capture both shared and complementary information for mapping cell state.
View Article and Find Full Text PDFThe phenotype of a cell and its underlying molecular state is strongly influenced by extracellular signals, including growth factors, hormones, and extracellular matrix proteins. While these signals are normally tightly controlled, their dysregulation leads to phenotypic and molecular states associated with diverse diseases. To develop a detailed understanding of the linkage between molecular and phenotypic changes, we generated a comprehensive dataset that catalogs the transcriptional, proteomic, epigenomic and phenotypic responses of MCF10A mammary epithelial cells after exposure to the ligands EGF, HGF, OSM, IFNG, TGFB and BMP2.
View Article and Find Full Text PDFButylated hydroxytoluene (BHT) is a synthetic antioxidant widely used in many industrial sectors. BHT is a well-studied compound for which there are many favorable regulatory decisions. However, a recent opinion by the French Agency for Food, Environmental and Occupational Health and Safety (ANSES) hypothesizes a role for BHT in endocrine disruption (ANSES (2021).
View Article and Find Full Text PDFMotivation: Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs.
View Article and Find Full Text PDFMost deaths from cancer are explained by metastasis, and yet large-scale metastasis research has been impractical owing to the complexity of in vivo models. Here we introduce an in vivo barcoding strategy that is capable of determining the metastatic potential of human cancer cell lines in mouse xenografts at scale. We validated the robustness, scalability and reproducibility of the method and applied it to 500 cell lines spanning 21 types of solid tumour.
View Article and Find Full Text PDFOpen data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research in which the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap).
View Article and Find Full Text PDFBackground: Most chemicals in commerce have not been evaluated for their carcinogenic potential. The de facto gold-standard approach to carcinogen testing adopts the 2-y rodent bioassay, a time-consuming and costly procedure. High-throughput in vitro assays are a promising alternative for addressing the limitations in carcinogen screening.
View Article and Find Full Text PDFMotivation: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges.
Results: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices.
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
View Article and Find Full Text PDFAlthough the value of proteomics has been demonstrated, cost and scale are typically prohibitive, and gene expression profiling remains dominant for characterizing cellular responses to perturbations. However, high-throughput sentinel assays provide an opportunity for proteomics to contribute at a meaningful scale. We present a systematic library resource (90 drugs × 6 cell lines) of proteomic signatures that measure changes in the reduced-representation phosphoproteome (P100) and changes in epigenetic marks on histones (GCP).
View Article and Find Full Text PDFWe previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
View Article and Find Full Text PDFThe application of RNA interference (RNAi) to mammalian cells has provided the means to perform phenotypic screens to determine the functions of genes. Although RNAi has revolutionized loss-of-function genetic experiments, it has been difficult to systematically assess the prevalence and consequences of off-target effects. The Connectivity Map (CMAP) represents an unprecedented resource to study the gene expression consequences of expressing short hairpin RNAs (shRNAs).
View Article and Find Full Text PDFRecent genome sequencing efforts have identified millions of somatic mutations in cancer. However, the functional impact of most variants is poorly understood. Here we characterize 194 somatic mutations identified in primary lung adenocarcinomas.
View Article and Find Full Text PDFUnlabelled: Cancer genome characterization efforts now provide an initial view of the somatic alterations in primary tumors. However, most point mutations occur at low frequency, and the function of these alleles remains undefined. We have developed a scalable systematic approach to interrogate the function of cancer-associated gene variants.
View Article and Find Full Text PDFDespite being extensively characterized structurally and biochemically, the functional role of histone deacetylase 8 (HDAC8) has remained largely obscure due in part to a lack of known cellular substrates. Herein, we describe an unbiased approach using chemical tools in conjunction with sophisticated proteomics methods to identify novel non-histone nuclear substrates of HDAC8, including the tumor suppressor ARID1A. These newly discovered substrates of HDAC8 are involved in diverse biological processes including mitosis, transcription, chromatin remodeling, and RNA splicing and may help guide therapeutic strategies that target the function of HDAC8.
View Article and Find Full Text PDF