DIMA: Data-Driven Selection of an Imputation Algorithm.

Janine Egert Eva Brombacher Bettina Warscheid Clemens Kreutz

J Proteome Res

Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.

Published: July 2021

Imputation is a prominent strategy when dealing with missing values (MVs) in proteomics data analysis pipelines. However, it is difficult to assess the performance of different imputation methods and varies strongly depending on data characteristics. To overcome this issue, we present the concept of a data-driven selection of an imputation algorithm (DIMA). The performance and broad applicability of DIMA are demonstrated on 142 quantitative proteomics data sets from the PRoteomics IDEntifications (PRIDE) database and on simulated data consisting of 5-50% MVs with different proportions of missing not at random and missing completely at random values. DIMA reliably suggests a high-performing imputation algorithm, which is always among the three best algorithms and results in a root mean square error difference (ΔRMSE) ≤ 10% in 80% of the cases. DIMA implementation is available in MATLAB at github.com/kreutz-lab/OmicsData and in R at github.com/kreutz-lab/DIMAR.

Download full-text PDF	Source
http://dx.doi.org/10.1021/acs.jproteome.1c00119	DOI Listing

Publication Analysis

Top Keywords

imputation algorithm

data-driven selection

selection imputation

proteomics data

dima

imputation

dima data-driven

algorithm imputation

imputation prominent

prominent strategy

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!