Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

OpenASR Challenge

The goal of the OpenASR (Open Automatic Speech Recognition) Challenge is to assess the state of the art of ASR technologies for low-resource languages.

The OpenASR Challenge is an open challenge created out of the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program that encompasses more tasks, including CLIR (cross-language information retrieval), domain classification, and summarization. Also see NIST's MATERIAL page.

For every year of MATERIAL, NIST supports a simplified, smaller scale evaluation open to all, focusing on a particular technology aspect of MATERIAL. CLIR technologies were the focus of the first open challenge in 2019, OpenCLIR. Since 2020, the focus has been on ASR. The capabilities tested in the open challenges are expected to ultimately support the MATERIAL task of effective triage and analysis of large volumes of text and audio content in a variety of less-studied languages.

 

OpenASR-Related Publications

Contact

Please email openasr_poc [at] nist.gov (openasr_poc[at]nist[dot]gov) for any questions or comments regarding the OpenASR Challenge.

 

OpenASR21 Challenge

The second OpenASR Challenge associated with MATERIAL, OpenASR21, opened for registration August 9, 2021. The evaluation period in November 2021. OpenASR21 features ASR evaluation opportunities for 15 low-resource languages:

  • All ten languages from OpenASR20
  • Five NEW languages for OpenASR21

For the languages from OpenASR20, the same evaluation datasets from 2020 will be used, consisting of conversational telephone speech (CTS) data. For the five new languages, the main evaluation dataset will also consist of CTS data. These datasets will be scored (where applicable) case-insensitively.

New for OpenASR21 will be case-sensitive scoring for three of the new languages, as indicated below. Case-sensitive scoring will be performed for system output from separate evaluation datasets from a mix of genres for these languages, in order to assess low-resource ASR performance specifically on proper nouns.

OpenASR21 languages:

  • Amharic
  • Cantonese
  • New: Farsi
  • New: Georgian
  • Guarani
  • Javanese
  • New: Kazakh (including additional evaluation dataset for case-sensitive scoring)
  • Kurmanji Kurdish
  • Mongolian
  • Pashto
  • Somali
  • New: Swahili (including additional evaluation dataset for case-sensitive scoring)
  • New: Tagalog (including additional evaluation dataset for case-sensitive scoring)
  • Tamil
  • Vietnamese

OpenASR21 will be implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

For more details, please refer to the OpenASR21 Challenge Evaluation Plan in the Documentation and Resources section below.

Schedule

MilestoneDate
Evaluation plan releaseJuly 2021
Registration periodAugust 9 – October 15, 2021
Development periodAugust 9 – November 2, 2021 (potentially longer but excluding evaluation period)
- Build and Dev datasets releaseAugust 9, 2021
- Scoring server accepts submissions for Dev datasetsAugust 30 – November 2, 2021 (potentially longer but excluding evaluation period)
Registration closesOctober 15, 2021
Evaluation periodNovember 3 – 10, 2021
- Release of Eval datasetsNovember 3, 2021
- Scoring server accepts submissionsNovember 4 – 10, 2021
- System output due at NISTNovember 10, 2021
System description due at NISTNovember 19, 2021

Registration

Registration opened on August 9, 2021. Please register via the OpenSAT web server.

Documentation and Resources

Results

The OpenASR21 evaluation was conducted in November 2021. Please see the OpenASR21 Challenge Results page.

Low-Resource ASR Development Special Session at INTERSPEECH 2022

OpenASR21 participants, as well as others working in the low-resource ASR problem space, are strongly encouraged to submit their work to the preliminarily accepted Low-Resource ASR Development special session at INTERSPEECH 2022. Please see the Call for Papers. This special session welcomes contributions from anyone working in the low-resource ASR problem space.

OpenASR20 Challenge

The first OpenASR Challenge associated with MATERIAL, OpenASR20, was opened for registration in July 2020, with an evaluation period in November 2020. It featured ASR evaluation opportunities for these ten low-resource languages:

  • Amharic
  • Cantonese
  • Guarani
  • Javanese
  • Kurmanji Kurdish
  • Mongolian
  • Pashto
  • Somali
  • Tamil
  • Vietnamese

It was implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

The evaluation plan posted in the Documentation and Resources section below describes the OpenASR20 Challenge in detail.

Registration

Registration for the OpenASR20 Challenge is now closed.

Documentation and Resources

Results

The OpenASR20 evaluation was conducted in November 2020. Please see the OpenASR20 Challenge Results page.

OpenASR20 and Low-Resource ASR Development Special Session at INTERSPEECH 2021

OpenASR20 participants, as well as others working in the low-resource ASR problem space, were strongly encouraged to submit their work to an OpenASR special session at INTERSPEECH 2021. Please see the OpenASR20 and Low-Resource ASR Development Special Session at INTERSPEECH 2021 Call for Papers. This special session also welcomed contributions from others working in the low-resource ASR problem space who did not participate in OpenASR20.

Created June 10, 2020, Updated December 11, 2024