De-identification of free text data containing personal health information: a scoping review of reviews.

Bekelu Negash Alan Katz Christine J Neilson Moniruzzaman Moni Marcello Nesca Alexander Singer Jennifer E Enns

Int J Popul Data Sci

Manitoba Centre for Health Policy, Department of Community Health Sciences, Rady Faculty of Health Sciences, University of Manitoba.

Published: February 2024

Introduction: Using data in research often requires that the data first be de-identified, particularly in the case of health data, which often include Personal Identifiable Information (PII) and/or Personal Health Identifying Information (PHII). There are established procedures for de-identifying structured data, but de-identifying clinical notes, electronic health records, and other records that include free text data is more complex. Several different ways to achieve this are documented in the literature. This scoping review identifies categories of de-identification methods that can be used for free text data.

Methods: We adopted an established scoping review methodology to examine review articles published up to May 9, 2022, in Ovid MEDLINE; Ovid Embase; Scopus; the ACM Digital Library; IEEE Explore; and Compendex. Our research question was: What methods are used to de-identify free text data? Two independent reviewers conducted title and abstract screening and full-text article screening using the online review management tool Covidence.

Results: The initial literature search retrieved 3,312 articles, most of which focused primarily on structured data. Eighteen publications describing methods of de-identification of free text data met the inclusion criteria for our review. The majority of the included articles focused on removing categories of personal health information identified by the Health Insurance Portability and Accountability Act (HIPAA). The de-identification methods they described combined rule-based methods or machine learning with other strategies such as deep learning.

Conclusion: Our review identifies and categorises de-identification methods for free text data as rule-based methods, machine learning, deep learning and a combination of these and other approaches. Most of the articles we found in our search refer to de-identification methods that target some or all categories of PHII. Our review also highlights how de-identification systems for free text data have evolved over time and points to hybrid approaches as the most promising approach for the future.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10898315	PMC
http://dx.doi.org/10.23889/ijpds.v8i1.2153	DOI Listing

Publication Analysis

Top Keywords

free text

text data

de-identification methods

personal health

scoping review

data

de-identification free

review

structured data

review identifies

Similar Publications

Thoracic Aorta Measurement Extraction from Computed Tomography Radiology Reports Using Instruction Tuned Large Language Models.

medRxiv

December 2024

Ely Erez Sedem Dankwa McKenzie Tuttle Afsheen Nasir Prashanth Vallabhajosyula

Chest computed tomography (CT) is essential for diagnosing and monitoring thoracic aortic dilations and aneurysms, conditions that place patients at risk of complications such as aortic dissection and rupture. However, aortic measurements in chest CT radiology reports are often embedded in free-text formats, limiting their accessibility for clinical care, quality improvement and research purposes. In this study, we developed a multi-method pipeline to extract structured aortic measurements from radiology reports, and compared the performance of fine-tuned BERT-based models with instruction-tuned Llama large language models (LLMs).

View Article and Find Full Text PDF

Similar Publications

Decoding consumers' interpretations of 'additive-free' and 'tobacco & water' cigarette advertising claims.

Tob Control

January 2025

Rutgers Institute for Nicotine & Tobacco Studies, New Brunswick, New Jersey, USA.

Caitlin Victoria Weiger Stefanie Kristen Gratale Ollie Ganz Melanie LaVake Eugene M Talbot

Objectives: In the USA, some tobacco companies replaced the marketing phrase '100% natural additive-free tobacco' with 'tobacco ingredients: tobacco & water' (T&W) after receiving warnings from the US Food and Drug Administration. This study assesses how people interpret the now-restricted additive-free claims and newer T&W claims on Natural American Spirit (NAS) and L&M cigarette packs.

Methods: An online between-subjects experiment randomised 2526 US adults to view one of three packs: an NAS additive-free pack, an NAS T&W pack or an L&M T&W pack.

View Article and Find Full Text PDF

Similar Publications

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.

JMIR Med Inform

January 2025

Medical Big Data Research Center, Chinese PLA General Hospital, Beijing, China.

Yan Zhuang Junyan Zhang Xiuxing Li Chao Liu Yue Yu

Background: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate.

View Article and Find Full Text PDF

Similar Publications

Nurse Experiences in an Electronic Health Record Transition: A Mixed Methods Analysis.

Comput Inform Nurs

January 2025

Author Affiliations: Center for the Study of Healthcare Innovation, Implementation & Policy, VA Greater Los Angeles Health Care (Dr Brunner and Ms Amano), CA; Michael E. DeBakey VA Medical Center (Dr Davila), Houston, TX; Department of Medicine-Health Services Research, Baylor College of Medicine (Dr Davila), Houston, TX; VA Ann Arbor Healthcare System (Dr Krein), MI; Division of General Medicine, Department of Internal Medicine, University of Michigan Medical School (Dr Krein), Ann Arbor; Office of Nursing Services, Veterans Health Administration (Dr Sullivan and Ms Church), Washington, DC; Center of Innovation for Veteran-Centered and Value-Driven Care, Seattle VA Medical Center (Dr Sayre), WA; University of Washington School of Public Health (Dr Sayre), Seattle; Center for Healthcare Organization and Implementation Research, VA Bedford Healthcare System (Dr Rinne), MA; and Division of Pulmonary and Critical Care Medicine, Department of Medicine, Geisel School of Medicine, Dartmouth University (Dr Rinne), MA.

Julian Brunner Alexis Amano Jessica Davila Sarah Krein Sheila C Sullivan

Transitions from one EHR to another can be enormously disruptive to care. Nurses are the largest group of EHR users, but nurse experiences with EHR transitions have not been well documented. We sought to understand nurse experiences with an EHR transition at the US Department of Veterans Affairs.

View Article and Find Full Text PDF

Similar Publications

Approaches for extracting daily dosage from free-text prescription signatures in heart failure with reduced ejection fraction: a comparative study.

JAMIA Open

February 2025

Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84108, United States.

Theodorus S Haaker Joshua S Choi Claude J Nanjo Phillip B Warner Ameen Abu-Hanna

Objective: To compare various methods for extracting daily dosage information from prescription signatures (sigs) and identify the best performers.

Materials And Methods: In this study, 5 daily dosage extraction methods were identified. Parsigs, RxSig, Sig2db, a large language model (LLM), and a bidirectional long short-term memory (BiLSTM) model were selected.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!