Imputation of cancer proteomics data with a deep model that learns from many datasets.

bioRxiv

Department of Genome Sciences, University of Washington.

Published: August 2024

Missing values are a major challenge in the analysis of mass spectrometry proteomics data. Missing values hinder reproducibility, decrease statistical power for identifying differentially expressed (DE) proteins and make it challenging to analyze low-abundance proteins. We present Lupine, a deep learning-based method for imputing, or estimating, missing values in tandem mass tag (TMT) proteomics data. Lupine is, to our knowledge, the first imputation method that is designed to learn jointly from many datasets, and we provide evidence that this approach leads to more accurate predictions. We validated Lupine by applying it to TMT data from >1,000 cancer patient samples spanning ten cancer types from the Clinical Proteomics Tumor Atlas Consortium (CPTAC). Lupine outperforms the state of the art for TMT imputation, identifies more DE proteins than other methods, corrects for TMT batch effects, and learns a meaningful representation of proteins and patient samples. Lupine is implemented as an open source Python package.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383014	PMC
http://dx.doi.org/10.1101/2024.08.26.609780	DOI Listing

Publication Analysis

Top Keywords

proteomics data

missing values

patient samples

lupine

imputation cancer

proteomics

cancer proteomics

data

data deep

deep model

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!