Background: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors' personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary.
Methods: In this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer - calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer - capturing the context feature of each character; CRF (Conditional Random Field) layer - labeling characters based on their features and transfer rules.
Results: The experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%.
Conclusions: The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454585 | PMC |
http://dx.doi.org/10.1186/s12911-019-0762-7 | DOI Listing |
J Neuroinflammation
December 2024
Research Institute for Medicines (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Lisboa, Portugal.
Multiple Sclerosis (MS), a neuroinflammatory disease of the central nervous system, is one of the commonest causes of non-traumatic disability among young adults. Impaired cognition arises as an impactful symptom affecting more than 50% of the patients and with substantial impact on social, economic, and individual wellbeing. Despite the lack of therapeutic strategies, many efforts have been made to understand the mechanisms behind cognitive impairment in MS patients.
View Article and Find Full Text PDFSci Rep
December 2024
Department of Software Engineering, University of Sargodha, Sargodha, Punjab, Pakistan.
The latest advancements of deep learning have resulted in a new era of natural language processing. The machines now possess an unparallel ability to interpret and engage with various tasks such as text classification, content generation and natural language understanding. This development extended to the analysis of human behavior, where deep learning models are used to decode human personality.
View Article and Find Full Text PDFMov Disord
November 2024
Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA.
Background: Commercial genome-wide genotyping arrays have historically neglected coverage of genetic variation across populations.
Objective: We aimed to create a multi-ancestry genome-wide array that would include a wide range of neuro-specific genetic content to facilitate genetic research in neurological disorders across multiple ancestral groups, fostering diversity and inclusivity in research studies.
Methods: We developed the Illumina NeuroBooster Array (NBA), a custom high-throughput and cost-effective platform on a backbone of 1,914,934 variants from the Infinium Global Diversity Array and added custom content comprising 95,273 variants associated with more than 70 neurological conditions or traits, and we further tested its performance on more than 2000 patient samples.
Ergonomics
August 2024
VelocityEHS, Chicago, IL, USA.
An ergonomics assessment of the physical risk factors in the workplace is instrumental in predicting and preventing musculoskeletal disorders (MSDs). Using Artificial Intelligence (AI) has become increasingly popular for ergonomics assessments because of the time savings and improved accuracy. However, most of the effort in this area starts and ends with producing risk scores, without providing guidance to reduce the risk.
View Article and Find Full Text PDFProc ACM Int Conf Inf Knowl Manag
October 2023
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!