Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / Information Access Division

Multimodal Information Group

OpenASR Challenge

The goal of the OpenASR (Open Automatic Speech Recognition) Challenge is to assess the state of the art of ASR technologies for low-resource languages.

The OpenASR Challenge is an open challenge created out of the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program that encompasses more tasks, including CLIR (cross-language information retrieval), domain classification, and summarization. Also see NIST's MATERIAL page.

For every year of MATERIAL, NIST supports a simplified, smaller scale evaluation open to all, focusing on a particular technology aspect of MATERIAL. CLIR technologies were the focus of the first open challenge in 2019, OpenCLIR. Since 2020, the focus has been on ASR. The capabilities tested in the open challenges are expected to ultimately support the MATERIAL task of effective triage and analysis of large volumes of text and audio content in a variety of less-studied languages.

OpenASR-Related Publications

Contact

Please email openasr_poc [at] nist.gov (openasr_poc[at]nist[dot]gov) for any questions or comments regarding the OpenASR Challenge.

OpenASR21 Challenge

The second OpenASR Challenge associated with MATERIAL, OpenASR21, opened for registration August 9, 2021. The evaluation period in November 2021. OpenASR21 features ASR evaluation opportunities for 15 low-resource languages:

All ten languages from OpenASR20
Five NEW languages for OpenASR21

For the languages from OpenASR20, the same evaluation datasets from 2020 will be used, consisting of conversational telephone speech (CTS) data. For the five new languages, the main evaluation dataset will also consist of CTS data. These datasets will be scored (where applicable) case-insensitively.

New for OpenASR21 will be case-sensitive scoring for three of the new languages, as indicated below. Case-sensitive scoring will be performed for system output from separate evaluation datasets from a mix of genres for these languages, in order to assess low-resource ASR performance specifically on proper nouns.

OpenASR21 languages:

Amharic
Cantonese
New: Farsi
New: Georgian
Guarani
Javanese
New: Kazakh (including additional evaluation dataset for case-sensitive scoring)
Kurmanji Kurdish
Mongolian
Pashto
Somali
New: Swahili (including additional evaluation dataset for case-sensitive scoring)
New: Tagalog (including additional evaluation dataset for case-sensitive scoring)
Tamil
Vietnamese

OpenASR21 will be implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

For more details, please refer to the OpenASR21 Challenge Evaluation Plan in the Documentation and Resources section below.

Schedule

Milestone	Date
Evaluation plan release	July 2021
Registration period	August 9 – October 15, 2021
Development period	August 9 – November 2, 2021 (potentially longer but excluding evaluation period)
- Build and Dev datasets release	August 9, 2021
- Scoring server accepts submissions for Dev datasets	August 30 – November 2, 2021 (potentially longer but excluding evaluation period)
Registration closes	October 15, 2021
Evaluation period	November 3 – 10, 2021
- Release of Eval datasets	November 3, 2021
- Scoring server accepts submissions	November 4 – 10, 2021
- System output due at NIST	November 10, 2021
System description due at NIST	November 19, 2021

Registration

Registration opened on August 9, 2021. Please register via the OpenSAT web server.

Documentation and Resources

OpenASR21 Challenge Evaluation Plan v1_3 (August 31, 2021)
IARPA Babel Program Data Specifications for Performers
IARPA MATERIAL Program Transcription Conventions
OpenASR toolkit (for file conversion, normalization, validation) v0.1.2
Speech Recognition Scoring Toolkit SCTK (includes SCLITE, ASCLITE, tranfilt, hubscr, SLATreport and utf_filt scoring tools)

Results

The OpenASR21 evaluation was conducted in November 2021. Please see the OpenASR21 Challenge Results page.

Low-Resource ASR Development Special Session at INTERSPEECH 2022

OpenASR21 participants, as well as others working in the low-resource ASR problem space, are strongly encouraged to submit their work to the preliminarily accepted Low-Resource ASR Development special session at INTERSPEECH 2022. Please see the Call for Papers. This special session welcomes contributions from anyone working in the low-resource ASR problem space.

OpenASR20 Challenge

The first OpenASR Challenge associated with MATERIAL, OpenASR20, was opened for registration in July 2020, with an evaluation period in November 2020. It featured ASR evaluation opportunities for these ten low-resource languages:

Amharic
Cantonese
Guarani
Javanese
Kurmanji Kurdish
Mongolian
Pashto
Somali
Tamil
Vietnamese

It was implemented as a track of NIST’s OpenSAT (Open Speech Analytic Technologies) evaluation series, using the OpenSAT web server for registration, data access, submission, and scoring purposes.

The evaluation plan posted in the Documentation and Resources section below describes the OpenASR20 Challenge in detail.

Registration

Registration for the OpenASR20 Challenge is now closed.

Documentation and Resources

OpenASR20 Challenge Evaluation Plan v1_5 (October 8, 2020)
IARPA Babel Program Data Specifications for Performers
OpenASR toolkit (for file conversion, normalization, validation) v0.1.1
Speech Recognition Scoring Toolkit SCTK (includes SCLITE, ASCLITE, tranfilt, hubscr, SLATreport and utf_filt scoring tools)

Results

The OpenASR20 evaluation was conducted in November 2020. Please see the OpenASR20 Challenge Results page.

OpenASR20 and Low-Resource ASR Development Special Session at INTERSPEECH 2021

OpenASR20 participants, as well as others working in the low-resource ASR problem space, were strongly encouraged to submit their work to an OpenASR special session at INTERSPEECH 2021. Please see the OpenASR20 and Low-Resource ASR Development Special Session at INTERSPEECH 2021 Call for Papers. This special session also welcomed contributions from others working in the low-resource ASR problem space who did not participate in OpenASR20.

Created June 10, 2020, Updated December 11, 2024