Background: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs.
Methods: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3 ×100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies.
Results: Removing case particles and using a context window size of 1+1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8+8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding.
Conclusions: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037651 | PMC |
http://dx.doi.org/10.1186/s13326-016-0093-x | DOI Listing |
Cureus
November 2024
General Surgery, Salmaniya Medical Complex, Manama, BHR.
Bariatric surgery has been shown to significantly affect type 2 diabetes mellitus (T2DM) remission, particularly in obese individuals. This systematic review aims to evaluate the effectiveness of bariatric surgical interventions in inducing remission of T2DM as well as to identify factors influencing surgical outcomes. The systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
View Article and Find Full Text PDFMult Scler
December 2024
Department of Advanced Medical and Surgical Sciences, University of Campania "Luigi Vanvitelli," Naples, Italy.
Background: The effect of cognitive reserve (CR) on cognition in people with relapsing-remitting multiple sclerosis (pwRRMS) has been partially investigated.
Objectives: We aimed to explore the long-term cognitive trajectories of pwRRMS based on their CR, measured using the Vocabulary Knowledge Test (VOC).
Methods: 78 pwRRMS underwent a neuropsychological evaluation at baseline and after a mean follow-up of 6.
Cereb Cortex
December 2024
Donders Institute for Brain, Cognition and Behaviour, Radboud University and Radboud University Medical Center, Kapittelweg 29, 6525 EN Nijmegen, The Netherlands.
In this study, we explored the relationship between developmental differences in gray matter structure and grammar learning ability in 159 Dutch-speaking individuals (8 to 25 yr). The data were collected as part of a recent large-scale functional MRI study (Menks WM, Ekerdt C, Lemhöfer K, Kidd E, Fernández G, McQueen JM, Janzen G. Developmental changes in brain activation during novel grammar learning in 8-25-year-olds.
View Article and Find Full Text PDFHealth Care Transit
August 2024
Medical Library, Boston Children's Hospital, USA.
Objective: The objective of this scoping review was to assess the extent of the literature on how relational components in pediatric care contribute to the transition process and transfer outcomes.
Background: Relationships between patients, parents and pediatric providers are a frequently cited barrier in transition to adult care. A scoping review aimed to identify studies focused on how the relationship between patients/parents and pediatric providers related to transition from pediatric to adult healthcare and explore the nature and depth of the evidence.
Brief Bioinform
November 2024
Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States.
Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!