Chinese medical named entity recognition (NER) is a fundamental task in Chinese medical natural language processing, aiming to recognize Chinese medical entities within unstructured medical texts. However, it poses significant challenges mainly due to the extensive usage of medical terms in Chinese medical texts. Although previous studies have made attempts to incorporate lexical or radical knowledge in order to improve the comprehension of medical texts, these studies either focus solely on one of these aspects or utilize a basic concatenation operation to combine these features, which fails to fully utilize the potential of lexical and radical knowledge.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
September 2021
Extracting clinical concepts and their relations from clinical narratives is one of the fundamental tasks in clinical natural language processing. Traditional solutions often separate this task into two subtasks with a pipeline architecture, which first recognize the named entities and then classify the relations between any possible entity pairs. The pipeline architecture, although widely used, has two limitations: 1) it suffers from error propagation from the recognition step to the classification step, 2) it cannot utilize the interactions between the two steps.
View Article and Find Full Text PDFDeveloping high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks.
View Article and Find Full Text PDFAMIA Annu Symp Proc
September 2020
Natural language processing (NLP) is useful for extracting information from clinical narratives, and both traditional machine learning methods and more-recent deep learning methods have been successful in various clinical NLP tasks. These methods often depend on traditional word embeddings that are outputs of language models (LMs). Recently, methods that are directly based on pre-trained language models themselves, followed by fine-tuning on the LMs (e.
View Article and Find Full Text PDFObjective: This study aims to develop and evaluate effective methods that can normalize diagnosis and procedure terms written by physicians to standard concepts in International Classification of Diseases(ICD) in Chinese, with the goal to facilitate automated medical coding in China.
Methods: We applied the entity-linking framework to normalize Chinese diagnosis and procedure terms, which consists of two steps - candidate concept generation and candidate concept ranking. For candidate concept generation, we implemented both the traditional BM25 algorithm and an extended version that integrates a synonym knowledgebase.
J Am Med Inform Assoc
March 2020
Objective: This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research.
Materials And Methods: We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers.
J Am Med Inform Assoc
December 2019
Objective: Extracting clinical entities and their attributes is a fundamental task of natural language processing (NLP) in the medical domain. This task is typically recognized as 2 sequential subtasks in a pipeline, clinical entity or attribute recognition followed by entity-attribute relation extraction. One problem of pipeline methods is that errors from entity recognition are unavoidably passed to relation extraction.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
May 2019
Developing high-throughput and high-performance phenotyping algorithms is critical to the secondary use of electronic health records for clinical research. Supervised machine learning-based methods have shown good performance, but often require large annotated datasets that are costly to build. Simulation studies have shown that active learning (AL) could reduce the number of annotated samples while improving the model performance when assuming that the time of labeling each sample is the same (i.
View Article and Find Full Text PDFObjective: This article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.
Materials And Methods: The clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component.
Stud Health Technol Inform
June 2018
Due to the differences in environments and cultures, consumers seeking cancer information in various regions of the world may have diverse needs. This study compares the cancer information needs for consumers in the US and China. Specifically, we first collected 1,000 cancer-related questions from Yahoo! Answers and Baidu Zhidao, respectively.
View Article and Find Full Text PDF