This project uses community evaluations to build the infrastructure required to evaluate system effectiveness on information access tasks. In particular, the project includes the annual TREC, TRECVID, and TAC evaluations, which foster research on search technologies for different media types (e.g., text, video) and genres (e.g., broadcast news, blogs, corporate repositories), as well as on language processing technologies such as summarization and information extraction.
The goal of the project is to support the development of automatic systems that can provide content-based access to information that has not been explicitly structured for machine consumption. This support is in the form of evaluation infrastructure, since to quote Lord Kelvin, "If you can not measure it, you can not improve it."
The infrastructure is created through community participation in three evaluation conferences, the Text REtrieval Conference (TREC), the TREC Video Retrieval Evaluation (TRECVID), and the Text Analysis Conference (TAC). Within each conference, several focus areas are selected in collaboration with participants, outside sponsors, and other stakeholders. For each focus area, the conference provides guidelines defining the evaluation task and a corresponding data set. Participants perform the task and submit their results to NIST. For most tasks, the union of the submitted results is annotated for correctness, and these annotations form the basis for scoring individual submissions. Finally, participants gather at a workshop to discuss their results.
Open evaluations with carefully selected test data has proved to be a powerful technique to advance technology and measure progress in a field. In addition to improving the state-of-the-art, a common focus on an evaluation task forms or solidifies a research community, establishes the research methodology for the field, facilitates technology transfer, and amortizes the costs associated with building the necessary infrastructure.
The evaluation conferences have created publicly-available data sets and appropriate evaluation methodology that enable individual research groups to measure the quality of their own access methods. These resources have been acknowledged as being instrumental in the development of information access technologies, and are routinely used and cited in work reported in the open literature.