Publications by authors named "Casey S Greene"

Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice versa) and biological contexts (holding out entire cancer types from pan-cancer data).

View Article and Find Full Text PDF

Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations.

View Article and Find Full Text PDF

Objective: Investigate the use of advanced natural language processing models to streamline the time-consuming process of writing and revising scholarly manuscripts.

Materials And Methods: For this purpose, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly texts. Our AI-based revision workflow employs a prompt generator that incorporates manuscript metadata into templates, generating section-specific instructions for the language model.

View Article and Find Full Text PDF

Science journalism is a critical way for the public to learn about and benefit from scientific findings. Such journalism shapes the public's view of the current state of science and legitimizes experts. Journalists can only cite and quote a limited number of sources, who they may discover in their research, including recommendations by other scientists.

View Article and Find Full Text PDF

High-throughput gene expression profiling measures individual gene expression across conditions. However, genes are regulated in complex networks, not as individual entities, limiting the interpretability of gene expression data. Machine learning models that incorporate prior biological knowledge are a powerful tool to extract meaningful biology from gene expression data.

View Article and Find Full Text PDF

Background: High-grade serous carcinoma (HGSC) gene expression subtypes are associated with differential survival. We characterized HGSC gene expression in Black individuals and considered whether gene expression differences by self-identified race may contribute to poorer HGSC survival among Black versus White individuals.

Methods: We included newly generated RNA sequencing data from Black and White individuals and array-based genotyping data from four existing studies of White and Japanese individuals.

View Article and Find Full Text PDF

Objective: To compare pedigree documentation and genetic test results to evaluate whether user-provided photographs influence the breed ancestry predictions of direct-to-consumer (DTC) genetic tests for dogs.

Animals: 12 registered purebred pet dogs representing 12 different breeds.

Methods: Each dog owner submitted 6 buccal swabs, 1 to each of 6 DTC genetic testing companies.

View Article and Find Full Text PDF

Chronic lung infections are a feature of cystic fibrosis (CF) that many patients experience even with the advent of highly effective modulator therapies. Identifying factors that impact in the CF lung could yield novel strategies to eradicate infection or otherwise improve outcomes. To complement published studies using laboratory models or RNA isolated from sputum, we analyzed transcripts of strain PAO1 after incubation in sputum from different CF donors prior to RNA extraction.

View Article and Find Full Text PDF

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions.

View Article and Find Full Text PDF

Motivation: Most models can be fit to data using various optimization approaches. While model choice is frequently reported in machine-learning-based research, optimizers are not often noted. We applied two different implementations of LASSO logistic regression implemented in Python's scikit-learn package, using two different optimization approaches (coordinate descent, implemented in the liblinear library, and stochastic gradient descent, or SGD), to predict mutation status and gene essentiality from gene expression across a variety of pan-cancer driver genes.

View Article and Find Full Text PDF

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine.

View Article and Find Full Text PDF

Introduction: High-grade serous carcinoma (HGSC) gene expression subtypes are associated with differential survival. We characterized HGSC gene expression in Black individuals and considered whether gene expression differences by race may contribute to poorer HGSC survival among Black versus non-Hispanic White individuals.

Methods: We included newly generated RNA-Seq data from Black and White individuals, and array-based genotyping data from four existing studies of White and Japanese individuals.

View Article and Find Full Text PDF
Article Synopsis
  • Understanding the human microbiome is important in biology, but there hasn't been a large resource for the 16S rRNA sequencing data commonly used for this purpose.
  • A new dataset of 168,484 human gut microbiome samples has been created, making it the largest unified microbiome resource available, accessible at microbiomap.org.
  • The study reveals that while certain bacteria types like Firmicutes are widespread, there are significant regional differences in microbiome composition, especially in less studied areas like Central and Southern Asia, indicating a need for more diverse microbiome research.
View Article and Find Full Text PDF
Article Synopsis
  • Single-cell gene expression profiling helps understand tumor diversity, but bulk tumor profiling is more common due to cost.
  • Experimental choices impact the measurements and can affect deconvolution algorithms that analyze tumor composition.
  • The study highlights how different experimental methods influence cell composition estimates and emphasizes the need for improved deconvolution approaches that account for these factors.
View Article and Find Full Text PDF

Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies.

View Article and Find Full Text PDF

Chronic lung infections are a distinctive feature of cystic fibrosis (CF) pathology, that challenge adults with CF even with the advent of highly effective modulator therapies. Characterizing transcription in the CF lung and identifying factors that drive gene expression could yield novel strategies to eradicate infection or otherwise improve outcomes. To complement published gene expression studies in laboratory culture models designed to model the CF lung environment, we employed an ex vivo sputum model in which laboratory strain PAO1 was incubated in sputum from different CF donors.

View Article and Find Full Text PDF
Article Synopsis
  • - High throughput gene expression profiling helps researchers develop hypotheses about biological functions and diseases, but has limitations in inferring biological pathways and managing the testing of numerous genes.
  • - The study introduces the Pathway-level information extractor (PLIER), an unsupervised machine learning tool trained on a large dataset of 190,111 mouse brain RNA-sequencing samples, enhancing data interpretation by reducing dimensionality.
  • - The researchers applied mousiPLIER to analyze aging in mouse brain microglia and astrocytes, identifying significant latent variables linked to aging, and created a web server for easy access to these findings, demonstrating its potential to reveal important biological processes.
View Article and Find Full Text PDF

Background: Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities.

View Article and Find Full Text PDF

While single-cell experiments provide deep cellular resolution within a single sample, some single-cell experiments are inherently more challenging than bulk experiments due to dissociation difficulties, cost, or limited tissue availability. This creates a situation where we have deep cellular profiles of one sample or condition, and bulk profiles across multiple samples and conditions. To bridge this gap, we propose BuDDI (BUlk Deconvolution with Domain Invariance).

View Article and Find Full Text PDF

Pediatric brain and spinal cancers are collectively the leading disease-related cause of death in children; thus, we urgently need curative therapeutic strategies for these tumors. To accelerate such discoveries, the Children's Brain Tumor Network (CBTN) and Pacific Pediatric Neuro-Oncology Consortium (PNOC) created a systematic process for tumor biobanking, model generation, and sequencing with immediate access to harmonized data. We leverage these data to establish OpenPBTA, an open collaborative project with over 40 scalable analysis modules that genomically characterize 1,074 pediatric brain tumors.

View Article and Find Full Text PDF

While we often think of words as having a fixed meaning that we use to describe a changing world, words are also dynamic and changing. Scientific research can also be remarkably fast-moving, with new concepts or approaches rapidly gaining mind share. We examined scientific writing, both preprint and pre-publication peer-reviewed text, to identify terms that have changed and examine their use.

View Article and Find Full Text PDF

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret.

View Article and Find Full Text PDF

In the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has made the importance of such efforts especially clear.

View Article and Find Full Text PDF