The project "Collaboration on Rare Diseases" CORD-MI connects various university hospitals in Germany to collect sufficient harmonized electronic health record (EHR) data for supporting clinical research in the field of rare diseases (RDs). However, the integration and transformation of heterogeneous data into an interoperable standard through Extract-Transform-Load (ETL) processes is a complex task that may influence the data quality (DQ). Local DQ assessments and control processes are needed to ensure and improve the quality of RD data. We therefore aim to investigate the impact of ETL processes on the quality of transformed RD data. Seven DQ indicators for three independent DQ dimensions were evaluated. The resulting reports show the correctness of calculated DQ metrics and detected DQ issues. Our study provides the first comparison results between the DQ of RD data before and after ETL processes. We found that ETL processes are challenging tasks that influence the quality of RD data. We have demonstrated that our methodology is useful and capable of evaluating the quality of real-world data stored in different formats and structures. Our methodology can therefore be used to improve the quality of RD documentation and to support clinical research.

Download full-text PDF

Source
http://dx.doi.org/10.3233/SHTI230121DOI Listing

Publication Analysis

Top Keywords

etl processes
16
data
9
data quality
8
real-world data
8
rare diseases
8
improve quality
8
quality data
8
quality
7
processes
5
local data
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!