Background: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.

Objective: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form "preferred term"[MH] OR "preferred term"[TIAB] OR "synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).

Methods: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard ("preferred term"[MH]), the number of citations retrieved by the added terms ("synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an "AND" operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a "preferred term," corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.

Results: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.

Conclusions: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user's objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303830PMC
http://dx.doi.org/10.2196/12799DOI Listing

Publication Analysis

Top Keywords

semantic expansion
24
recall f-measure
16
expansion strategies
12
precision recall
12
best semantic
8
strategies
8
search strategies
8
medical subject
8
subject heading
8
"preferred term"[mh]
8

Similar Publications

Having a detailed description of the psycholinguistic properties of a language is essential for conducting well-controlled language experiments. However, there is a paucity of databases for some languages and regional varieties, including Québec French. The SyllabO+ corpus was created to provide a complete phonological and syllabic analysis of a corpus of spoken Québec French.

View Article and Find Full Text PDF

A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval.

Sci Rep

December 2024

Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada.

Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion.

View Article and Find Full Text PDF

A novel methodology for dataset augmentation in the semantic segmentation of coil-coated surface degradation is presented in this study. Deep convolutional generative adversarial networks (DCGAN) are employed to generate synthetic input-target pairs, which closely resemble real-world data, with the goal of expanding an existing dataset. These augmented datasets are used to train two state-of-the-art models, U-net, and DeepLabV3, for the precise detection of degradation areas around scribes.

View Article and Find Full Text PDF

Hierarchical existential prior based on expanded pseudo-label for crack detection.

Rev Sci Instrum

December 2024

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi'an 710021, Shaanxi, People's Republic of China.

Road crack detection approaches based on the image processing technique have attracted much attention during the past decade due to their convenience and efficiency, but most of them cannot achieve the expected performances due to the complex background interference and severe category imbalance of road images. This paper presents a hierarchical existential prior based on an expanded pseudo-label for crack detection. In particular, the framework contains three variants of U-Net, and each sub-network is trained by pseudo-labels generated by transforming semantic categories of non-crack pixels distributed in the neighborhoods of crack ones.

View Article and Find Full Text PDF

Glycoconjugates are present on microbial surfaces and play critical roles in modulating interactions with the environment and the host. Extensive research on microbial glycans, including elucidating the structural diversity of the glycan moieties of glycoconjugates and polysaccharides, has been carried out to investigate the function of glycans in modulating the interactions between the host and microbes, to explore their potential applications in the therapeutic targeting of pathogenic species, and in the use as probiotics in gut microbiomes. However, glycan-related information is dispersed across numerous databases and a vast amount of literature, which makes it laborious and time-consuming to identify and gather the relevant information about microbial glycosylation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!