Patient-derived xenografts (PDX) model human intra- and intertumoral heterogeneity in the context of the intact tissue of immunocompromised mice. Histologic imaging via hematoxylin and eosin (H&E) staining is routinely performed on PDX samples, which could be harnessed for computational analysis. Prior studies of large clinical H&E image repositories have shown that deep learning analysis can identify intercellular and morphologic signals correlated with disease phenotype and therapeutic response.
View Article and Find Full Text PDFAccurate identification of germline variants (DNVs) remains a challenging problem despite rapid advances in sequencing technologies as well as methods for the analysis of the data they generate, with putative solutions often involving filters and visual inspection of identified variants. Here, we present a purely informatic method for the identification of DNVs by analyzing short-read genome sequencing data from proband-parent trios. Our method evaluates variant calls generated by three genome sequence analysis pipelines utilizing different algorithms-GATK HaplotypeCaller, DeepTrio and Velsera GRAF-exploring the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs.
View Article and Find Full Text PDFAs the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies to cloud-based computing environments that we call a ecure and uthorized AIR nvironment (SAFE).
View Article and Find Full Text PDFGraph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload.
View Article and Find Full Text PDFDevelopment of candidate cancer treatments is a resource-intensive process, with the research community continuing to investigate options beyond static genomic characterization. Toward this goal, we have established the genomic landscapes of 536 patient-derived xenograft (PDX) models across 25 cancer types, together with mutation, copy number, fusion, transcriptomic profiles, and NCI-MATCH arms. Compared with human tumors, PDXs typically have higher purity and fit to investigate dynamic driver events and molecular properties via multiple time points from same case PDXs.
View Article and Find Full Text PDFPatient-derived xenografts (PDXs) are resected human tumors engrafted into mice for preclinical studies and therapeutic testing. It has been proposed that the mouse host affects tumor evolution during PDX engraftment and propagation, affecting the accuracy of PDX modeling of human cancer. Here, we exhaustively analyze copy number alterations (CNAs) in 1,451 PDX and matched patient tumor (PT) samples from 509 PDX models.
View Article and Find Full Text PDFThe findings that amyotrophic lateral sclerosis (ALS) patients almost universally display pathological mislocalization of the RNA-binding protein TDP-43 and that mutations in its gene cause familial ALS have nominated altered RNA metabolism as a disease mechanism. However, the RNAs regulated by TDP-43 in motor neurons and their connection to neuropathy remain to be identified. Here we report transcripts whose abundances in human motor neurons are sensitive to TDP-43 depletion.
View Article and Find Full Text PDFThe Seven Bridges Cancer Genomics Cloud (CGC) is part of the National Cancer Institute Cloud Resource project, which was created to explore the paradigm of co-locating massive datasets with the computational resources to analyze them. The CGC was designed to allow researchers to easily find the data they need and analyze it with robust applications in a scalable and reproducible fashion. To enable this, individual tools are packaged within Docker containers and described by the Common Workflow Language (CWL), an emerging standard for enabling reproducible data analysis.
View Article and Find Full Text PDFAdvances in stem cell science allow the production of different cell types either through the recapitulation of developmental processes, often termed 'directed differentiation', or the forced expression of lineage-specific transcription factors. Although cells produced by both approaches are increasingly used in translational applications, their quantitative similarity to their primary counterparts remains largely unresolved. To investigate the similarity between -derived and primary cell types, we harvested and purified mouse spinal motor neurons and compared them with motor neurons produced by transcription factor-mediated lineage conversion of fibroblasts or directed differentiation of pluripotent stem cells.
View Article and Find Full Text PDFIncreased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed.
View Article and Find Full Text PDFCurr Protoc Bioinformatics
December 2017
Next-generation sequencing has produced petabytes of data, but accessing and analyzing these data remain challenging. Traditionally, researchers investigating public datasets like The Cancer Genome Atlas (TCGA) would download the data to a high-performance cluster, which could take several weeks even with a highly optimized network connection. The National Cancer Institute (NCI) initiated the Cancer Genomics Cloud Pilots program to provide researchers with the resources to process data with cloud computational resources.
View Article and Find Full Text PDFThe Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas.
View Article and Find Full Text PDFAs biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses.
View Article and Find Full Text PDFChromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes.
View Article and Find Full Text PDFNeurons produced from stem cells have emerged as a tool to identify new therapeutic targets for neurological diseases such as amyotrophic lateral sclerosis (ALS). However, it remains unclear to what extent these new mechanistic insights will translate to animal models, an important step in the validation of new targets. Previously, we found that glia from mice carrying the SOD1G93A mutation, a model of ALS, were toxic to stem cell-derived human motor neurons.
View Article and Find Full Text PDFPreclinical and clinical evidence implicates N-methyl-d-aspartate receptor (NMDAr) signaling in early embryological development. However, the role of NMDAr signaling in early development has not been well studied. Here, we use a mouse embryonic stem cell model to perform a step-wise exploration of the effects of NMDAr signaling on early cell fate specification.
View Article and Find Full Text PDFAlthough many distinct mutations in a variety of genes are known to cause Amyotrophic Lateral Sclerosis (ALS), it remains poorly understood how they selectively impact motor neuron biology and whether they converge on common pathways to cause neuronal degeneration. Here, we have combined reprogramming and stem cell differentiation approaches with genome engineering and RNA sequencing to define the transcriptional and functional changes that are induced in human motor neurons by mutant SOD1. Mutant SOD1 protein induced a transcriptional signature indicative of increased oxidative stress, reduced mitochondrial function, altered subcellular transport, and activation of the ER stress and unfolded protein response pathways.
View Article and Find Full Text PDFIt has been suggested that the transcription factor Nanog is essential for the establishment of pluripotency during the derivation of embryonic stem cells and induced pluripotent stem cells (iPSCs). However, successful reprogramming to pluripotency with a growing list of divergent transcription factors, at ever-increasing efficiencies, suggests that there may be many distinct routes to a pluripotent state. Here, we have investigated whether Nanog is necessary for reprogramming murine fibroblasts under highly efficient conditions using the canonical-reprogramming factors Oct4, Sox2, Klf4, and cMyc.
View Article and Find Full Text PDFAll muscle movements, including breathing, walking, and fine motor skills rely on the function of the spinal motor neuron to transmit signals from the brain to individual muscle groups. Loss of spinal motor neuron function underlies several neurological disorders for which treatment has been hampered by the inability to obtain sufficient quantities of primary motor neurons to perform mechanistic studies or drug screens. Progress towards overcoming this challenge has been achieved through the synthesis of developmental biology paradigms and advances in stem cell and reprogramming technology, which allow the production of motor neurons in vitro.
View Article and Find Full Text PDF