With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5583002 | PMC |
http://dx.doi.org/10.1016/j.jbi.2017.07.017 | DOI Listing |
Open Res Eur
January 2025
Heidelberger Institut für Global Health, Universitätsklinikum Heidelberg, Heidelberg, Baden-Württemberg, 69120, Germany.
Introduction: The benefits of sharing participant-level data, including clinical or epidemiological data, genomic data, high-dimensional imaging data, or human-derived samples, from biomedical studies have been widely touted and may be taken for granted. As investments in data sharing and reuse efforts continue to grow, understanding the cost and positive and negative effects of data sharing for research participants, the general public, individual researchers, research and development, clinical practice, and public health is of growing importance. In this scoping review, we will identify and summarize existing evidence on the positive and negative impacts and costs of data sharing and how they are measured.
View Article and Find Full Text PDFHRB Open Res
January 2025
Department of Psychiatry, University College Dublin, Dublin, Leinster, Ireland.
Background: Individuals with first-episode psychosis (FEP) face an increased risk of physical comorbidities, notably cardiovascular diseases, metabolic disorders, respiratory disorders, and certain types of cancer. Previous reviews report pooled physical health prevalence from chronic psychosis and FEP groups. By contrast, this review will focus on antipsychotic-naïve FEP cohorts and incorporate data from observational longitudinal studies and antipsychotic intervention studies to understand the progression of physical health comorbidities from the onset to later stages of psychosis.
View Article and Find Full Text PDFIdentifying immunosuppressed patients using structured data can be challenging. Large language models effectively extract structured concepts from unstructured clinical text. Here we show that GPT-4o outperforms traditional approaches in identifying immunosuppressive conditions and medication use by processing hospital admission notes.
View Article and Find Full Text PDFClin Optom (Auckl)
January 2025
Research Department, Southern College of Optometry, Memphis, TN, USA.
Purpose: To determine the performance of TOTAL30 for Astigmatism (T30fA; Alcon; Fort Worth, TX, USA) contact lenses (CLs) in existing CL wearers who are also frequent digital device users.
Methods: This 1-month, 3-visit study recruited adult, 18- to 40-year-old subjects who were required to use daily digital devices for at least 8 hours per day. All subjects were refit into T30fA CLs.
Gastro Hep Adv
September 2024
Division of Gastroenterology, University of Pennsylvania, Philadelphia, Pennsylvania.
Background And Aims: Inadequate bowel preparation which occurs in 25% of colonoscopies is a major barrier to the effectiveness of screening for colorectal cancer. We aim to develop an artificial intelligence (machine learning) algorithm to assess photos of stool output after bowel preparation to predict inadequate bowel preparation before colonoscopy.
Methods: Patients were asked to text a photo of their stool in the commode when they believed that they neared completion of their colonoscopy bowel preparation.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!