Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The Effect of Topic Set Size on Retrieval Experiment Error

Published

Author(s)

Ellen M. Voorhees, C E. Buckley

Abstract

Retrieval mechanisms are frequently compared by computing the respective average scores for some effectiveness metric across a common set of information needs or topics. Since retrieval system behavior is known to be highly variable across topics, good experimental design requires that a sufficient number of topics be used in the test. This paper uses TREC results to empirically derive error rates based on the number of topics used in a test and the observed difference in the average scores. The error rates quantify the likelihood that a different set of topics of the same size would lead to a different conclusion. We directly compute error rates for topic sets up to size 25, and extrapolate those rates for larger topic set sizes. The error rates found are larger than anticipated, indicating researchers need to take care when concluding one method is better than another, especially if few topics are used.
Proceedings Title
SIGIR Conference
Conference Location
CA

Keywords

evaluation, information retrieval, TREC

Citation

Voorhees, E. and Buckley, C. (2002), The Effect of Topic Set Size on Retrieval Experiment Error, SIGIR Conference, CA (Accessed December 15, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created August 1, 2002, Updated February 17, 2017