Creation of reliable relevance judgments in information retrieval systems evaluation experimentation through crowdsourcing: a review.

ScientificWorldJournal

Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia.

Published: June 2015

Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055211PMC
http://dx.doi.org/10.1155/2014/135641DOI Listing

Publication Analysis

Top Keywords

relevance judgments
16
retrieval systems
12
evaluation experimentation
8
judgments crowdsourcing
8
crowdsourcing experiment
8
crowdsourcing
6
relevance
5
judgments
5
creation reliable
4
reliable relevance
4

Similar Publications

Ureteropelvic junction obstruction (UPJO) is a common pediatric condition often treated with pyeloplasty. Despite the surgical intervention, postoperative urinary tract infections (UTIs) occur in over 30% of cases within six months, adversely affecting recovery and increasing both clinical and economic burdens. Current prediction methods for postoperative UTIs rely on empirical judgment and limited clinical parameters, underscoring the need for a robust, multifactorial predictive model.

View Article and Find Full Text PDF

To explore the practice and application of learning curve theory in improving prescription review skills in standardized training for pharmacists in medical institutions, and to provide reference for enhancing the effectiveness of standardized training for pharmacists in medical institutions. A retrospective analysis was conducted on the relevant data of 20 students who participated in our hospital's standardized pharmacist training in 2022 and 2023 during their prescription review practice learning process. The prescription review practice learning process is divided into 10 stages, with 100 prescriptions in each stage.

View Article and Find Full Text PDF

Background: Legitimate androgen use, such as testosterone replacement therapy, requires a legal prescription. Off-label use for reasons like wellness and aesthetics continues to grow. Recent regulatory changes in Australia aim to curb non-prescribed androgen use, potentially intensifying stigma, however seeking prescriptions through legal channels persists.

View Article and Find Full Text PDF

Working memory capacity modulates Serial dependence in facial Identity: Evidence from behavioral and EEG data.

Vision Res

January 2025

Department of Psychology, Lund University, Allhelgona kyrkogata 16A, 223 50 Lund, Sweden. Electronic address:

Serial dependence (SD) is said to occur when the judgment of a current stimulus is drawn toward a no longer relevant stimulus from the recent past. Working memory (WM) contributes to the ability to discriminate between irrelevant and relevant sensory impressions. How WM contributes to SD in facial identity remains to be fully understood.

View Article and Find Full Text PDF

The Spatial-Numerical Association of Response Codes (SNARC) effect refers to the phenomenon of faster left-hand responses to smaller numbers and faster right-hand responses to larger ones. The current study examined the possible long-lasting effects of magnitude-relevant stimulus-response compatibility (SRC) practices on the SNARC effect in a transfer paradigm. Participants performed a magnitude classification task including either SNARC-compatible or SNARC-incompatible trials as practice.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!