Background: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.
Objective: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form "preferred term"[MH] OR "preferred term"[TIAB] OR "synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).
Methods: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard ("preferred term"[MH]), the number of citations retrieved by the added terms ("synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an "AND" operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a "preferred term," corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.
Results: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.
Conclusions: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user's objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303830 | PMC |
http://dx.doi.org/10.2196/12799 | DOI Listing |
Behav Res Methods
January 2025
Centre de recherche CERVO, Québec City, QC, Canada.
Having a detailed description of the psycholinguistic properties of a language is essential for conducting well-controlled language experiments. However, there is a paucity of databases for some languages and regional varieties, including Québec French. The SyllabO+ corpus was created to provide a complete phonological and syllabic analysis of a corpus of spoken Québec French.
View Article and Find Full Text PDFSci Rep
December 2024
Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada.
Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion.
View Article and Find Full Text PDFFront Artif Intell
December 2024
Faculty of Electrical Engineering and Informatics, University of Pardubice, Pardubice, Czechia.
A novel methodology for dataset augmentation in the semantic segmentation of coil-coated surface degradation is presented in this study. Deep convolutional generative adversarial networks (DCGAN) are employed to generate synthetic input-target pairs, which closely resemble real-world data, with the goal of expanding an existing dataset. These augmented datasets are used to train two state-of-the-art models, U-net, and DeepLabV3, for the precise detection of degradation areas around scribes.
View Article and Find Full Text PDFRev Sci Instrum
December 2024
School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi'an 710021, Shaanxi, People's Republic of China.
Road crack detection approaches based on the image processing technique have attracted much attention during the past decade due to their convenience and efficiency, but most of them cannot achieve the expected performances due to the complex background interference and severe category imbalance of road images. This paper presents a hierarchical existential prior based on an expanded pseudo-label for crack detection. In particular, the framework contains three variants of U-Net, and each sub-network is trained by pseudo-labels generated by transforming semantic categories of non-crack pixels distributed in the neighborhoods of crack ones.
View Article and Find Full Text PDFBBA Adv
November 2024
Glycan and Life Systems Integration Center (GaLSIC), Soka University, Hachioji, Tokyo, Japan.
Glycoconjugates are present on microbial surfaces and play critical roles in modulating interactions with the environment and the host. Extensive research on microbial glycans, including elucidating the structural diversity of the glycan moieties of glycoconjugates and polysaccharides, has been carried out to investigate the function of glycans in modulating the interactions between the host and microbes, to explore their potential applications in the therapeutic targeting of pathogenic species, and in the use as probiotics in gut microbiomes. However, glycan-related information is dispersed across numerous databases and a vast amount of literature, which makes it laborious and time-consuming to identify and gather the relevant information about microbial glycosylation.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!