NIST logo

Text Retrieval Conference 2012 Seeks Information Retrieval Experts for Data Digging

From NIST Tech Beat: January 10, 2012

*
Bookmark and Share

Contact: Evelyn Brown
301-975-5661

The National Institute of Standards and Technology (NIST) is conducting the 21st annual Text Retrieval Conference (TREC), the premier experimental effort in the field, to encourage research in information retrieval and related applications. TREC is a rather unusual conference in that it starts months ahead of the actual meeting (Nov. 6-9, 2012) with the distribution of test data sets and challenges that TREC participants will use to develop and test advanced text retrieval techniques.

Finding valuable information rapidly is much more than a game for people with a high-tech phone. Text retrieval is a field of research that can save lives by helping medical researchers locate key patient information or aid lawyers seeking important data in large digital data collections—both modern-day examples of needles in haystacks.

TREC brings together scientists from academia and public and private-sector organizations to focus on improving information retrieval in specific areas. The groups develop algorithms to find information from large, challenging datasets often provided by NIST. They work throughout the year and come to NIST's headquarters in Gaithersburg, Md., to discuss their findings at the November meeting.

A recent economic impact study* prepared for NIST found that the NIST-led TREC project has significantly improved the ability to retrieve digital data. The report notes that TREC-related improvements are responsible for about one-third of the web-search advances between 1999 and 2009 and that the improvements may have saved up to 3 billion hours of web-search time.

TREC challenges are grouped into tracks that target difficult text-retrieval challenges. Retrieving text within Medical Records, for example, addresses a common problem in designing clinical trials: finding important information from patient records that is generally in unstructured "comment" fields. Of a vast patient data set, for example, which patients take herbal products for osteoarthritis?

The 2012 TREC adds two new tracks, "Contextual Suggestion" and "Knowledge Base Acceleration." In Contextual Suggestion, researchers will study methods for answering vague queries specifically based on personal demographics and information from calendar and contacts apps. For example, according to TREC Conference Organizer Ellen Voorhees, "A person arrives in a city for a business trip and has a free evening, so asks the search system in their smart phone 'What should I do tonight?' To answer this well, the system will need to integrate information about the user's likes/dislikes with external information such as schedules of events in the city, availability of tickets, whether friends of the user are also in the area and other data."

The Knowledge Base Acceleration track looks to automate the process of keeping a knowledge base up-to-date. One example is having a system monitor a news feed to keep a Wikipedia page current about an event, such as the Occupy Wall Street Movement, for example.

Six tracks will continue from TREC 2011: Crowdsourcing, Legal, Medical Records, Microblog, Session and Web.

Applications to participate in TREC 2012 are being accepted through February 22. For more information on TREC and participating, see http://trec.nist.gov.

* B.R. Rowe, D.W.Wood, A.N. Link and D.A. Simoni. Economic Impact Assessment of NIST's Text REtrieval Conference (TREC) Program. RTI Project Number 0211875, July 2010. Available on-line at: http://trec.nist.gov/pubs/2010.economic.impact.pdf.