PLOS Digit Health
October 2022
The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost-measured in terms of access to future medical innovations and clinical software-of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization.
View Article and Find Full Text PDFObjectives: To develop and demonstrate the feasibility of a Global Open Source Severity of Illness Score (GOSSIS)-1 for critical care patients, which generalizes across healthcare systems and countries.
Design: A merger of several critical care multicenter cohorts derived from registry and electronic health record data. Data were split into training (70%) and test (30%) sets, using each set exclusively for development and evaluation, respectively.