Objective: This project demonstrates the feasibility of connecting medical imaging data and features, SARS-CoV-2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID-19-related morbidities. The N3C curated a rich collection of aggregated and de-identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID-positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID-19.
Materials And Methods: Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID-19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de-identified cryptographic hashes to match records associated with the same individuals without using patient identifiers.
Results: The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization.
Conclusion: Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733468 | PMC |
http://dx.doi.org/10.1002/lrh2.10457 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!