An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which the evaluation task can be completed. This concept, however, violates some of the premises of traditional pool-based collection building, and, as a result, the quality of the resulting test collection may be compromised. In particular, the service API might restrict the diversity of runs that contribute to the pool: not only may this hamper innovation by researchers, but the lack of diversity might lead to incomplete judgment pools that affect the reusability of the collection. This paper shows that the distinctiveness of the retrieval runs used to construct the first test collection built using EaaS, the TREC 2013 Microblog collection, is not substantially different from that of the TREC-8 ad hoc collection, a high-quality collection built using traditional pooling. An additional test of collection reusability, the `leave out uniques' test, suggests the Microblog 2013 collection's pools are less complete than the TREC-8 collection, though both collections strongly benefit from the presence of a set of distinctive and effective manual runs. Although we cannot yet generalize to all EaaS evaluations, our analyses reveal no obvious flaws in the test collection built using the methodology in the TREC 2013 Microblog track.
Voorhees, E.
(2014),
On Run Diversity in "Evaluation as a Service", Proceedings of SIGIR 2014, Gold Coast, -1, [online], https://doi.org/10.1145/2600428.2609484
(Accessed November 21, 2024)