NASA has employed high-throughput molecular assays to identify sub-cellular changes impacting human physiology during spaceflight. Machine learning (ML) methods hold the promise to improve our ability to identify important signals within highly dimensional molecular data. However, the inherent limitation of study subject numbers within a spaceflight mission minimizes the utility of ML approaches. To overcome the sample power limitations, data from multiple spaceflight missions must be aggregated while appropriately addressing intra- and inter-study variabilities. Here we describe an approach to log transform, scale and normalize data from six heterogeneous, mouse liver-derived transcriptomics datasets (n= 137) which enabled ML-methods to classify spaceflown vs. ground control animals (AUC ≥ 0.87) while mitigating the variability from mission-of-origin. Concordance was found between liver-specific biological processes identified from harmonized ML-based analysis and study-by-study classical omics analysis. This work demonstrates the feasibility of applying ML methods on integrated, heterogeneous datasets of small sample size.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11167036PMC
http://dx.doi.org/10.1038/s41526-024-00379-3DOI Listing

Publication Analysis

Top Keywords

transcriptomics datasets
8
harmonizing heterogeneous
4
heterogeneous transcriptomics
4
datasets machine
4
machine learning-based
4
learning-based analysis
4
analysis identify
4
identify spaceflown
4
spaceflown murine
4
murine liver-specific
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!