Privacy preserving distributed learning classifiers - Sequential learning with small sets of data.

Fadila Zerka Visara Urovi Fabio Bottari Ralph T H Leijenaar Sean Walsh Hanif Gabrani-Juma Martin Gueuning Akshayaa Vaidyanathan Wim Vos Mariaelena Occhipinti Henry C Woodruff Michel Dumontier Philippe Lambin

Comput Biol Med

The D-Lab, Department of Precision Medicine, GROW - School for Oncology, Maastricht University, Maastricht, the Netherlands; Department of Radiology and Nuclear Medicine, Maastricht University Medical Centre+, Maastricht, the Netherlands.

Published: September 2021

Background: Artificial intelligence (AI) typically requires a significant amount of high-quality data to build reliable models, where gathering enough data within a single institution can be particularly challenging. In this study we investigated the impact of using sequential learning to exploit very small, siloed sets of clinical and imaging data to train AI models. Furthermore, we evaluated the capacity of such models to achieve equivalent performance when compared to models trained with the same data over a single centralized database.

Methods: We propose a privacy preserving distributed learning framework, learning sequentially from each dataset. The framework is applied to three machine learning algorithms: Logistic Regression, Support Vector Machines (SVM), and Perceptron. The models were evaluated using four open-source datasets (Breast cancer, Indian liver, NSCLC-Radiomics dataset, and Stage III NSCLC).

Findings: The proposed framework ensured a comparable predictive performance against a centralized learning approach. Pairwise DeLong tests showed no significant difference between the compared pairs for each dataset.

Interpretation: Distributed learning contributes to preserve medical data privacy. We foresee this technology will increase the number of collaborative opportunities to develop robust AI, becoming the default solution in scenarios where collecting enough data from a single reliable source is logistically impossible. Distributed sequential learning provides privacy persevering means for institutions with small but clinically valuable datasets to collaboratively train predictive AI while preserving the privacy of their patients. Such models perform similarly to models that are built on a larger central dataset.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.compbiomed.2021.104716	DOI Listing

Publication Analysis

Top Keywords

distributed learning

sequential learning

data single

learning

privacy preserving

preserving distributed

models evaluated

data

models

privacy

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!