A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval.

Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X Huang

Sci Rep

Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada.

Published: December 2024

Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686017	PMC
http://dx.doi.org/10.1038/s41598-024-82871-0	DOI Listing

Publication Analysis

Top Keywords

pseudo-relevant documents

query expansion

semantic pseudo-relevant

multi-dimensional semantic

pre-trained models

document ranking

initial retrieval

pre-trained model

sentences passages

term-level weights

Similar Publications

A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval.

Sci Rep

December 2024

Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada.

Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X Huang

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!