Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The Effect of Assessor Errors on IR System Evaluation

Published

Author(s)

Ben Carterette, Ian Soboroff

Abstract

Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the topics, this is a reasonable approach. However, as topic set sizes grow, and researchers look to crowdsourcing and Amazon's Mechanical Turk to collect relevance judgments, we are faced with issues of quality control. This paper examines the robustness of the TREC Million Query track methods when some assessors make significant and systematic errors. We find that while averages are robust, assessor errors can have a large effect on system rankings.
Proceedings Title
Proceedings of the 33nd Annual International ACM SIGIR Conference on Research and Development Information Retrieval
Conference Dates
July 19-23, 2010
Conference Location
Geneva, CH

Keywords

information retrieval, test collections

Citation

Carterette, B. and Soboroff, I. (2010), The Effect of Assessor Errors on IR System Evaluation, Proceedings of the 33nd Annual International ACM SIGIR Conference on Research and Development Information Retrieval, Geneva, CH (Accessed October 31, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created July 18, 2010, Updated October 12, 2021