Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study.

Clément R Massonnaud Gaétan Kerdelhué Julien Grosjean Romain Lelong Nicolas Griffon Stefan J Darmoni

JMIR Med Inform

Department of Biomedical Informatics, Rouen University Hospital, Rouen, France.

Published: June 2020

Background: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.

Objective: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form "preferred term"[MH] OR "preferred term"[TIAB] OR "synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).

Methods: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard ("preferred term"[MH]), the number of citations retrieved by the added terms ("synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an "AND" operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a "preferred term," corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.

Results: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.

Conclusions: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user's objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303830	PMC
http://dx.doi.org/10.2196/12799	DOI Listing

Publication Analysis

Top Keywords

semantic expansion

recall f-measure

expansion strategies

precision recall

best semantic

strategies

search strategies

medical subject

subject heading

"preferred term"[mh]

Similar Publications

Expansion of the SyllabO+ corpus and database: Words, lemmas, and morphology.

Behav Res Methods

January 2025

Centre de recherche CERVO, Québec City, QC, Canada.

Noémie Auclair-Ouellet Alexandra Lavoie Pascale Bédard Alexandra Barbeau-Morrison Patrick Drouin

Having a detailed description of the psycholinguistic properties of a language is essential for conducting well-controlled language experiments. However, there is a paucity of databases for some languages and regional varieties, including Québec French. The SyllabO+ corpus was created to provide a complete phonological and syllabic analysis of a corpus of spoken Québec French.

View Article and Find Full Text PDF

Similar Publications

A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval.

Sci Rep

December 2024

Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada.

Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X Huang

Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion.

View Article and Find Full Text PDF

Similar Publications

Efficient dataset extension using generative networks for assessing degree of coating degradation around scribe.

Front Artif Intell

December 2024

Faculty of Electrical Engineering and Informatics, University of Pardubice, Pardubice, Czechia.

Dominik Stursa Pavel Rozsival Petr Dolezel

A novel methodology for dataset augmentation in the semantic segmentation of coil-coated surface degradation is presented in this study. Deep convolutional generative adversarial networks (DCGAN) are employed to generate synthetic input-target pairs, which closely resemble real-world data, with the goal of expanding an existing dataset. These augmented datasets are used to train two state-of-the-art models, U-net, and DeepLabV3, for the precise detection of degradation areas around scribes.

View Article and Find Full Text PDF

Similar Publications

Hierarchical existential prior based on expanded pseudo-label for crack detection.

Rev Sci Instrum

December 2024

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi'an 710021, Shaanxi, People's Republic of China.

Nan Wang Jie Fang Jianfu Yin Xiaoqian Cao

Road crack detection approaches based on the image processing technique have attracted much attention during the past decade due to their convenience and efficiency, but most of them cannot achieve the expected performances due to the complex background interference and severe category imbalance of road images. This paper presents a hierarchical existential prior based on an expanded pseudo-label for crack detection. In particular, the framework contains three variants of U-Net, and each sub-network is trained by pseudo-labels generated by transforming semantic categories of non-crack pixels distributed in the neighborhoods of crack ones.

View Article and Find Full Text PDF

Similar Publications

MicroGlycoDB: A database of microbial glycans using Semantic Web technologies.

BBA Adv

November 2024

Glycan and Life Systems Integration Center (GaLSIC), Soka University, Hachioji, Tokyo, Japan.

Sunmyoung Lee Louis-David Leclercq Yann Guerardel Christine M Szymanski Thomas Hurtaux

Glycoconjugates are present on microbial surfaces and play critical roles in modulating interactions with the environment and the host. Extensive research on microbial glycans, including elucidating the structural diversity of the glycan moieties of glycoconjugates and polysaccharides, has been carried out to investigate the function of glycans in modulating the interactions between the host and microbes, to explore their potential applications in the therapeutic targeting of pathogenic species, and in the use as probiotics in gut microbiomes. However, glycan-related information is dispersed across numerous databases and a vast amount of literature, which makes it laborious and time-consuming to identify and gather the relevant information about microbial glycosylation.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!