The OpenASR (Open Automatic Speech Recognition) 2020 Challenge was the second open challenge associated with the IARPA MATERIAL program, after the OpenCLIR (Open Cross-Language Information Retrieval) 2019 Challenge. Capabilities tested in these open challenges are expected to ultimately support the MATERIAL task of effective triage and analysis of large volumes of text and audio content in a variety of less-studied languages. OpenASR20 was implemented as a track of NIST’s OpenSAT (Speech Analytic Technologies) evaluation series.
The goal of the OpenASR20 challenge was to assess the state of the art of automatic speech recognition (ASR) technologies for low-resource languages. ASR was performed on speech datasets, and written text output had to be produced.
Please refer to the OpenASR20 Challenge Evaluation Plan for a full description of the challenge and its rules and procedures.
OpenASR20 was offered for the following ten low-resource languages, of which participants could attempt as many as they wished:
The data for the challenge consisted of conversational telephone speech stemming from the IARPA Babel program, with the exception of Somali which stemmed from MATERIAL. More details regarding technical data details can be found in section 3 of the IARPA Babel Data Specifications for Performers. For each language, separate training, development, and evaluation datasets were provided.
The challenge offered two training conditions:
The most important milestones of the schedule of the challenge were as follows:
28 teams from 12 countries registered to participate, out of which nine fully completed the challenge (i.e. submitted valid output for at least one language under the Constrained training condition). Table 1 lists the fully participating teams.
Organization |
Team |
AMH |
CAN |
GUA |
JAV |
KUR |
MON |
PAS |
SOM |
TAM |
VIE |
Catskills Research Co., USA |
Catskills |
x |
|||||||||
Centre de Recherche Informatique de Montréal, Canada |
CRIM |
x |
|||||||||
Tencent, China |
MMT |
x |
|||||||||
National Sun Yat-sen University, Taiwan |
NSYSU-MITLab |
x |
x |
||||||||
Speechlab, Shanghai Jiao Tong University, China |
Speechlab_SJTU |
x |
x |
x |
x |
x |
x |
x |
x |
||
Tallinn University of Technology, Estonia |
TalTech |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
Tsinghua University, China |
THUEE |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
Tencent & Tsinghua University, China |
TNT |
x |
x |
||||||||
Tal, China |
upteam |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
Table 1: OpenASR20 Participants
Table 2 lists the best WER result achieved by each team, ordered by language, training condition (Unconstrained submissions in italics), and WER score. Late submissions are marked as such and listed at the bottom of the table. Self-reported time and memory resources are not included in this overview of results.
On-time Submissions | ||||
Language |
Training Condition |
Team |
WER |
CER |
Amharic |
Constrained |
TalTech |
0.4505 |
0.3430 |
Amharic |
Constrained |
THUEE |
0.4582 |
0.3528 |
Amharic |
Constrained |
Speechlab_SJTU |
1.0162 |
0.8897 |
Amharic |
Constrained |
upteam |
1.3841 |
1.3621 |
Amharic |
Unconstrained |
Speechlab_SJTU |
1.0162 |
0.8897 |
Language |
Training Condition |
Team |
WER |
CER |
Cantonese |
Constrained |
TNT |
0.4024 |
0.3511 |
Cantonese |
Constrained |
THUEE |
0.4362 |
0.3798 |
Cantonese |
Constrained |
TalTech |
0.4540 |
0.4005 |
Cantonese |
Constrained |
NSYSU-MITLab |
0.6145 |
0.5588 |
Cantonese |
Constrained |
Speechlab_SJTU |
0.7586 |
0.7040 |
Cantonese |
Constrained |
upteam |
1.3133 |
1.3301 |
Cantonese |
Unconstrained |
TNT |
0.3200 |
0.2643 |
Cantonese |
Unconstrained |
Speechlab_SJTU |
0.7586 |
0.7040 |
Language |
Training Condition |
Team |
WER |
CER |
Guarani |
Constrained |
THUEE |
0.4609 |
0.4216 |
Guarani |
Constrained |
TalTech |
0.4664 |
0.4314 |
Guarani |
Constrained |
Speechlab_SJTU |
0.9909 |
0.9611 |
Guarani |
Constrained |
upteam |
1.2143 |
1.2127 |
Guarani |
Unconstrained |
Speechlab_SJTU |
0.9909 |
0.9611 |
Language |
Training Condition |
Team |
WER |
CER |
Javanese |
Constrained |
THUEE |
0.5210 |
0.5216 |
Javanese |
Constrained |
TalTech |
0.5376 |
0.5384 |
Javanese |
Constrained |
Speechlab_SJTU |
0.9443 |
0.9447 |
Javanese |
Constrained |
upteam |
1.3490 |
1.3490 |
Javanese |
Unconstrained |
Speechlab_SJTU |
0.9443 |
0.9447 |
Language |
Training Condition |
Team |
WER |
CER |
Kurmanji-Kurdish |
Constrained |
TalTech |
0.6529 |
0.6107 |
Kurmanji-Kurdish |
Constrained |
THUEE |
0.6686 |
0.6236 |
Kurmanji-Kurdish |
Constrained |
CRIM |
0.7529 |
0.7091 |
Kurmanji-Kurdish |
Constrained |
upteam |
1.0905 |
1.0810 |
Kurmanji-Kurdish |
Constrained |
Speechlab_SJTU |
1.1198 |
1.0500 |
Kurmanji-Kurdish |
Unconstrained |
Speechlab_SJTU |
1.1198 |
1.0500 |
Language |
Training Condition |
Team |
WER |
CER |
Mongolian |
Constrained |
THUEE |
0.4540 |
0.3297 |
Mongolian |
Constrained |
MMT |
0.4546 |
0.3310 |
Mongolian |
Constrained |
TalTech |
0.4729 |
0.3452 |
Mongolian |
Constrained |
Speechlab_SJTU |
0.9717 |
0.8045 |
Mongolian |
Constrained |
upteam |
1.0289 |
1.0042 |
Mongolian |
Unconstrained |
MMT |
0.4064 |
0.2998 |
Mongolian |
Unconstrained |
TNT |
0.4554 |
0.3369 |
Mongolian |
Unconstrained |
Speechlab_SJTU |
0.9717 |
0.8045 |
Language | Training Condition | Team | WER | CER |
Pashto |
Constrained |
TalTech |
0.4568 |
0.3163 |
Pashto |
Constrained |
THUEE |
0.4859 |
0.3391 |
Pashto |
Constrained |
upteam |
1.3732 |
1.3488 |
Language |
Training Condition |
Team |
WER |
CER |
Somali |
Constrained |
TalTech |
0.5914 |
0.5926 |
Somali |
Constrained |
THUEE |
0.5958 |
0.5967 |
Somali |
Constrained |
Speechlab_SJTU |
1.0444 |
1.0449 |
Somali |
Constrained |
Catskills |
1.1385 |
1.1390 |
Somali |
Constrained |
upteam |
1.2301 |
1.2301 |
Somali |
Unconstrained |
Speechlab_SJTU |
1.0444 |
1.0449 |
Language |
Training Condition |
Team |
WER |
CER |
Tamil |
Constrained |
TalTech |
0.6511 |
0.4165 |
Tamil |
Constrained |
THUEE |
0.6605 |
0.4426 |
Tamil |
Constrained |
Speechlab_SJTU |
1.0555 |
0.8027 |
Tamil |
Constrained |
upteam |
1.3513 |
1.3207 |
Tamil |
Unconstrained |
Speechlab_SJTU |
1.0555 |
0.8027 |
Language |
Training Condition |
Team |
WER |
CER |
Vietnamese |
Constrained |
TalTech |
0.4514 |
0.4069 |
Vietnamese |
Constrained |
THUEE |
0.4605 |
0.4125 |
Vietnamese |
Constrained |
NSYSU-MITLab |
0.7461 |
0.7023 |
Vietnamese |
Constrained |
upteam |
1.4107 |
1.4121 |
Late Submissions | ||||
Language | Training Condition | Team | WER | CER |
Mongolian | Constrained | TNT† | 0.4500† | 0.3515† |
Table 2: OpenASR20 Results. † = late submission.
As part of the evaluation submission, participants were required to include a paper to describe their systems. Participants were also encouraged to submit their work to be included in the OpenASR and Low-Resource ASR Development Special Session at INTERSPEECH 2021. The system descriptions with consent to be released publicly are provided below:
NIST serves to coordinate the evaluations in order to support research and to help advance the state- of-the-art. NIST evaluations are not viewed as a competition, and such results reported by NIST are not to be construed, or represented, as endorsements of any participant’s system, or as official findings on the part of NIST or the U.S. Government.
Please email openasr_poc [at] nist.gov (openasr_poc[at]nist[dot]gov) for any questions or comments regarding the OpenASR Challenge.