Homo sapiens and Neanderthals underwent hybridization during the Middle/Upper Paleolithic age, culminating in retention of small amounts of Neanderthal-derived DNA in the modern human genome. In the current study, we address the potential roles Neanderthal single nucleotide polymorphisms (SNP) may be playing in autism susceptibility in samples of black non-Hispanic, white Hispanic, and white non-Hispanic people using data from the Simons Foundation Powering Autism Research (SPARK), Genotype-Tissue Expression (GTEx), and 1000 Genomes (1000G) databases. We have discovered that rare variants are significantly enriched in autistic probands compared to race-matched controls.
View Article and Find Full Text PDFLegumes establish a symbiotic relationship with nitrogen-fixing rhizobia by developing nodules. Nodules are modified lateral roots that undergo changes in their cellular development in response to bacteria, but the transcriptional reprogramming that occurs in these root cells remains largely uncharacterized. Here, we describe the cell-type-specific transcriptome response of Medicago truncatula roots to rhizobia during early nodule development in the wild-type genotype Jemalong A17, complemented with a hypernodulating mutant (sunn-4) to expand the cell population responding to infection and subsequent biological inferences.
View Article and Find Full Text PDFWe report a public resource for examining the spatiotemporal RNA expression of 54,893 genes during the first 72 h of response to rhizobial inoculation. Using a methodology that allows synchronous inoculation and growth of more than 100 plants in a single media container, we harvested the same segment of each root responding to rhizobia in the initial inoculation over a time course, collected individual tissues from these segments with laser capture microdissection, and created and sequenced RNA libraries generated from these tissues. We demonstrate the utility of the resource by examining the expression patterns of a set of genes induced very early in nodule signaling, as well as two gene families (CLE peptides and nodule specific PLAT-domain proteins) and show that despite similar whole-root expression patterns, there are tissue differences in expression between the genes.
View Article and Find Full Text PDFNodule number regulation in legumes is controlled by a feedback loop that integrates nutrient and rhizobia symbiont status signals to regulate nodule development. Signals from the roots are perceived by shoot receptors, including a CLV1-like receptor-like kinase known as SUNN in . In the absence of functional SUNN, the autoregulation feedback loop is disrupted, resulting in hypernodulation.
View Article and Find Full Text PDFSummary: Large-scale and whole-cell modeling has multiple challenges, including scalable model building and module communication bottlenecks (e.g. between metabolism, gene expression, signaling, etc.
View Article and Find Full Text PDFJ Autism Dev Disord
September 2023
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder characterized by challenges in social communication as well as repetitive or restrictive behaviors. Many genetic associations with ASD have been identified, but most associations occur in a fraction of the ASD population. Here, we searched for eQTL-associated DNA variants with significantly different allele distributions between ASD-affected and control.
View Article and Find Full Text PDFMechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. However, the construction and simulation of such models have proved challenging. Here, we developed a python-based model creation and simulation pipeline that converts a few structured text files into an SBML standard and is high-performance- and cloud-computing ready.
View Article and Find Full Text PDFBackground: Thyroid cancer (THCA) is the most common endocrine malignancy and incidence is increasing. There is an urgent need to better understand the molecular differences between THCA tumors at different pathologic stages so appropriate diagnostic, prognostic, and treatment strategies can be applied. Transcriptome State Perturbation Generator (TSPG) is a tool created to identify the changes in gene expression necessary to transform the transcriptional state of a source sample to mimic that of a target.
View Article and Find Full Text PDFBackground: Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma.
View Article and Find Full Text PDFBackground: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility.
View Article and Find Full Text PDFIn response to colonization by rhizobia bacteria, legumes are able to form nitrogen-fixing nodules in their roots, allowing the plants to grow efficiently in nitrogen-depleted environments. Legumes utilize a complex, long-distance signaling pathway to regulate nodulation that involves signals in both roots and shoots. We measured the transcriptional response to treatment with rhizobia in both the shoots and roots of over a 72-h time course.
View Article and Find Full Text PDFGene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges.
View Article and Find Full Text PDFUterine cancer is the fourth most common cancer among women, projected to affect 66,000 US women in 2021. Uterine cancer often arises in the inner lining of the uterus, known as the endometrium, but can present as several different types of cancer, including endometrioid cancer, serous adenocarcinoma, and uterine carcinosarcoma. Previous studies have analyzed the genetic changes between normal and cancerous uterine tissue to identify specific genes of interest, including TP53 and PTEN.
View Article and Find Full Text PDFAdvanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data.
View Article and Find Full Text PDFWe introduce the Transcriptome State Perturbation Generator (TSPG) as a novel deep-learning method to identify changes in genomic expression that occur between tissue states using generative adversarial networks. TSPG learns the transcriptome perturbations from RNA-sequencing data required to shift from a source to a target class. We apply TSPG as an effective method of detecting biologically relevant alternate expression patterns between normal and tumor human tissue samples.
View Article and Find Full Text PDFThe human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain's structure.
View Article and Find Full Text PDFUrgent responses to the COVID-19 pandemic depend on increased collaboration and sharing of data, models, and resources among scientists and researchers. In many scientific fields and disciplines, institutional norms treat data, models, and resources as proprietary, emphasizing competition among scientists and researchers locally and internationally. Concurrently, long-standing norms of open data and collaboration exist in some scientific fields and have accelerated within the last two decades.
View Article and Find Full Text PDFBigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm.
View Article and Find Full Text PDFOnline biological databases housing genomics, genetic and breeding data can be constructed using the Tripal toolkit. Tripal is an open-source, internationally developed framework that implements FAIR data principles and is meant to ease the burden of constructing such websites for research communities. Use of a common, open framework improves the sustainability and manageability of such as site.
View Article and Find Full Text PDFFrom noble beginnings as a prospective forage, polyploid ('Johnsongrass') is both an invasive species and one of the world's worst agricultural weeds. Formed by x hybridization, we show to have -enriched allele composition and striking mutations in 5,957 genes that differentiate it from representatives of its progenitor species and an outgroup. The spread of may have been facilitated by introgression from closely-related cultivated sorghum near genetic loci affecting rhizome development, seed size, and levels of lutein, a photochemical protectant and abscisic acid precursor.
View Article and Find Full Text PDFTraveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth's gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms, under partial gravity conditions. Here, we present results from our latest space experiments on the ISS, in which seeds of were germinated, and seedlings grew for six days under different gravity levels, namely micro-, several intermediate partial- levels, and 1, and were subjected to irradiation with blue light for the last 48 h.
View Article and Find Full Text PDFRoot nodulation results from a symbiotic relationship between a plant host and bacteria. Synchronized gene expression patterns over the course of rhizobial infection result in activation of pathways that are unique but overlapping with the highly conserved pathways that enable mycorrhizal symbiosis. We performed RNA sequencing of 30 root maturation zone samples at five distinct time points.
View Article and Find Full Text PDFGene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context.
View Article and Find Full Text PDFCommunity biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise.
View Article and Find Full Text PDFGiven the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes.
View Article and Find Full Text PDF