Missing values are a major challenge in the analysis of mass spectrometry proteomics data. Missing values hinder reproducibility, decrease statistical power for identifying differentially expressed (DE) proteins and make it challenging to analyze low-abundance proteins. We present Lupine, a deep learning-based method for imputing, or estimating, missing values in tandem mass tag (TMT) proteomics data. Lupine is, to our knowledge, the first imputation method that is designed to learn jointly from many datasets, and we provide evidence that this approach leads to more accurate predictions. We validated Lupine by applying it to TMT data from >1,000 cancer patient samples spanning ten cancer types from the Clinical Proteomics Tumor Atlas Consortium (CPTAC). Lupine outperforms the state of the art for TMT imputation, identifies more DE proteins than other methods, corrects for TMT batch effects, and learns a meaningful representation of proteins and patient samples. Lupine is implemented as an open source Python package.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383014PMC
http://dx.doi.org/10.1101/2024.08.26.609780DOI Listing

Publication Analysis

Top Keywords

proteomics data
12
missing values
12
patient samples
8
lupine
5
imputation cancer
4
proteomics
4
cancer proteomics
4
data
4
data deep
4
deep model
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!