Multi-scale variational autoencoder for imputation of missing values in untargeted metabolomics using whole-genome sequencing data.

Comput Biol Med

Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.

Published: September 2024

AI Article Synopsis

  • - The study addresses the issue of missing data in metabolomics, highlighting how integrating whole-genome sequencing (WGS) can improve data accuracy and completeness in analyses.
  • - A new method using a multi-scale variational autoencoder is proposed to impute unknown metabolites by combining genomic data, including polygenic risk scores and SNPs, with metabolomics information.
  • - The results show that this method outperforms traditional imputation techniques, achieving better data imputation rates, which can enhance the understanding of metabolic pathways and their links to diseases.

Article Abstract

Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies.

Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-scale variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information.

Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R-scores > 0.01 for 71.55 % of metabolites.

Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324385PMC
http://dx.doi.org/10.1016/j.compbiomed.2024.108813DOI Listing

Publication Analysis

Top Keywords

wgs data
16
data
13
data metabolomics
12
metabolomics data
12
data imputation
12
metabolomics
9
multi-scale variational
8
variational autoencoder
8
missing values
8
whole-genome sequencing
8

Similar Publications

Oligogenic risk score for Gilles de la Tourette syndrome reveals a genetic continuum of tic disorders.

J Appl Genet

January 2025

Department of Neurogenetics and Functional Genomics, Mossakowski Medical Research Institute, Polish Academy of Sciences, Pawińskiego 5, 02-106, Warsaw, Poland.

Gilles de la Tourette syndrome (GTS) and other tic disorders (TDs) have a substantial genetic component with their heritability estimated at between 60 and 80%. Here we propose an oligogenic risk score of TDs using whole-genome sequencing (WGS) data from a group of Polish GTS patients, their families, and control samples (n = 278). In this study, we first reviewed the literature to obtain a preliminary list of 84 GTS/TD candidate genes.

View Article and Find Full Text PDF

Human noroviruses are the leading cause of non-bacterial shellfish-associated gastroenteritis. In 2022, a multi-jurisdictional norovirus outbreak associated with contaminated oysters occurred that involved hundreds of illnesses. Here, we conducted genetic analysis on 30 clinical samples associated with this oyster outbreak.

View Article and Find Full Text PDF

Chromothripsis, a hallmark of cancer, is characterized by extensive and localized DNA rearrangements involving one or a few chromosomes. However, its genome-wide frequency and characteristics in urothelial carcinoma (UC) remain largely unknown. Here, by analyzing single-regional and multi-regional whole-genome sequencing (WGS), we present the chromothripsis blueprint in 488 UC patients.

View Article and Find Full Text PDF

Trichophyton indotineae, first identified in India, has increasingly been reported in Asia, the Middle East, Europe, and recently in the USA. The global spread of terbinafine-resistant T. indotineae underscores the urgency of the issue.

View Article and Find Full Text PDF

Biomarkers.

Alzheimers Dement

December 2024

Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD, USA.

Background: The mitochondrial cascade hypothesis suggests that mitochondrial dysfunction plays an important role in the pathogenesis of Alzheimer's disease dementia. Recent data have shown that mitochondrial DNA copy number (mtDNAcn) in human blood is associated with dementia risk and cognitive function, but which specific cognitive measures or domains are associated with mitochondrial dysfunction and whether this relationship is affected by health deterioration such as physical frailty or mitochondrial somatic mutations is not clear.

Methods: We measured mtDNAcn and heteroplasmies using fastMitoCalc and MitoCaller, respectively, from UK Biobank Whole Genome Sequencing (WGS) data at study entry (2006-2010).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!