Molecular analytics increasingly utilize machine learning (ML) for predictive modeling based on data acquired through molecular profiling technologies. However, developing robust models that accurately capture physiological phenotypes is challenged by the dynamics inherent to biological systems, variability stemming from analytical procedures, and the resource-intensive nature of obtaining sufficiently representative datasets. Here, we propose and evaluate a new method: Contextual Out-of-Distribution Integration (CODI). Based on experimental observations, CODI generates synthetic data that integrate unrepresented sources of variation encountered in real-world applications into a given molecular fingerprint dataset. By augmenting a dataset with out-of-distribution variance, CODI enables an ML model to better generalize to samples beyond the seed training data, reducing the need for extensive experimental data collection. Using three independent longitudinal clinical studies and a case-control study, we demonstrate CODI's application to several classification tasks involving vibrational spectroscopy of human blood. We showcase our approach's ability to enable personalized fingerprinting for multiyear longitudinal molecular monitoring and enhance the robustness of trained ML models for improved disease detection. Our comparative analyses reveal that incorporating CODI into the classification workflow consistently leads to increased robustness against data variability and improved predictive accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495219PMC
http://dx.doi.org/10.1093/pnasnexus/pgae449DOI Listing

Publication Analysis

Top Keywords

molecular profiling
8
contextual out-of-distribution
8
out-of-distribution integration
8
codi
5
molecular
5
data
5
codi enhancing
4
enhancing machine
4
machine learning-based
4
learning-based molecular
4

Similar Publications

Next-generation cancer phenomics by deployment of multiple molecular endophenotypes coupled with high-throughput analyses of gene expression offer veritable opportunities for triangulation of discovery findings in non-small cell lung cancer (NSCLC) research. This study reports differentially expressed genes in NSCLC using publicly available datasets (GSE18842 and GSE229253), uncovering 130 common genes that may potentially represent crucial molecular signatures of NSCLC. Additionally, network analyses by GeneMANIA and STRING revealed significant coexpression and interaction patterns among these genes, with four notable hub genes-, , and -identified as pivotal in NSCLC progression.

View Article and Find Full Text PDF

Mapping the spatial atlas of the human bone tissue integrating spatial and single-cell transcriptomics.

Nucleic Acids Res

January 2025

Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, 1440 Canal Street, Downtown, New Orleans, LA 70112, USA.

Bone is a multifaceted tissue requiring orchestrated interplays of diverse cells within specialized microenvironments. Although significant progress has been made in understanding cellular and molecular mechanisms of component cells of bone, revealing their spatial organization and interactions in native bone tissue microenvironment is crucial for advancing precision medicine, as they govern fundamental signaling pathways and functional dependencies among various bone cells. In this study, we present the first integrative high-resolution map of human bone and bone marrow, using spatial and single-cell transcriptomics profiling from femoral tissue.

View Article and Find Full Text PDF

Gene expression is regulated by chromatin DNA methylation and other features, including histone post-translational modifications (PTMs), chromatin remodelers and transcription factor occupancy. A complete understanding of gene regulation will require the mapping of these chromatin features in small cell number samples. Here we describe a novel genome-wide chromatin profiling technology, named as Nicking Enzyme Epitope targeted DNA sequencing (NEED-seq).

View Article and Find Full Text PDF

Irritable bowel syndrome (IBS) is a multifactorial condition with heterogeneous pathophysiology, including intestinal permeability alterations. The aim of the present study was to assess the ability of a probiotic blend (PB) consisting of two strains (CECT7484 and CECT7485) and one strain of (CECT7483) to recover the permeability increase induced by mediators from IBS mucosal biopsies and to highlight the underlying molecular mechanisms. Twenty-one IBS patients diagnosed according to ROME IV criteria (11 IBS-D and 10 IBS-M) and 7 healthy controls were enrolled.

View Article and Find Full Text PDF

Ovarian cancer (OC) ranks as the fifth leading cause of cancer-related deaths in the United States, posing a significant threat to female health. Late-stage diagnoses, driven by elusive symptoms often masquerading as gastrointestinal issues, contribute to a concerning 70% of cases being identified in advanced stages. While early-stage OC brags a 90% cure rate, progression involving pelvic organs or extending beyond the peritoneal cavity drastically diminishes it.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!