Date of release: Tue Oct 27 15:48:58 2009
Version: mt09_public_v1
The NIST 2009 Open Machine Translation Evaluation (MT09) is part of an ongoing series of evaluations of human language translation technology. NIST conducts these evaluations in order to support machine translation (MT) research and help advance the state-of-the-art in machine translation technology. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. The evaluation was administered as outlined in the official MT09 evaluation plan.
These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers of commercial MT products were generally from research systems, not commercially available products. Since MT09 was an evaluation of research algorithms, the MT09 test design required local implementation by each participant. As such, participants were only required to submit their translation system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.
Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.
There is ongoing discussion within the MT research community regarding the most informative metrics for machine translation. The design and implementation of these metrics are themselves very much part of the research. At the present time, there is no single metric that has been deemed to be completely indicative of all aspects of system performance.
The data, protocols, and metrics employed in this evaluation were chosen to support MT research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.
Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.
MT09 was a test of text-to-text MT technology. The evaluation consisted of three tasks, differing only by the source language processed:
MT research and development requires language data resources. System performance is strongly affected by the type and amount of resources used. Therefore, two different resource categories were defined as conditions of evaluation. The categories differ solely by the amount of data that was available for use in the training and development of the core MT engine. These evaluation conditions were called "Constrained Training" and "Unconstrained Training". See the evaluation specification document for a complete description of allowable resources for each.
In recent years, performance improvements have been demonstrated through the use of system combination techniques. For MT09, two evaluation tracks were supported that were called "Single System Track" and "System Combination Track". Results are reported separately for each track. As the names of each track implies, a key feature of systems entered in the Single System Track is that the resulting translations are produced by primarily one algorithmic approach, while translations from the System Combination Track result from a combination technique where 2 or more core algorithmic approaches are used.
The following table contains the approximate source word count, for each language pair and data genre, separately for the Current Test Set and the Progress Test Set. For the Chinese-to-English language pair, we consider that a Chinese word is 1.5 characters, on average.
Language Pair | Data Genre | Current Test Set | Progress Test Set |
---|---|---|---|
Arabic-to-English | Newswire | 16K words (68 documents) | 20K words (81 documents) |
Web | 15K words (67 documents) | 15K words (51 documents) | |
Chinese-to-English | Newswire | 20K words (82 documents) | |
Web | 15K words (40 documents) | ||
Urdu-to-English | Newswire | 24K words (72 documents) | |
Web | 21K words (166 documents) |
perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -b
-c
: case-sensitive scoring-b
: BLEU score onlyperl bleu-1.04.pl -r REFERENCE_FILE -t CANDIDATE_FILE
NormalizeText
method was updated to properly un-escape XML entities.perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -n
-c
: case-sensitive scoring-n
: NIST score onlyjava -jar tercom.7.25.jar -r REFERENCE_FILE -h CANDIDATE_FILE -N -s
-N
: enables normalization-s
: case-sensitive scoringperl meteor.pl -s SYSTEM_ID -r REFERENCE_FILE -t CANDIDATE_FILE --modules "exact porter_stem wn_stem wn_synonymy"
--modules "exact porter_stem wn_stem wn_synonymy"
: uses all four METEOR matching modules, in that orderThe following table lists the organizations participating in MT09 and the test sets they registered to process.
Site ID | Organization | Location | Current Test Set | Progress Test Set | ||
---|---|---|---|---|---|---|
Arabic-to-English | Urdu-to-English | Arabic-to-English | Chinese-to-English | |||
afrl | Air Force Research Laboratory | USA | - | Yes | - | - |
amsterdam | University of Amsterdam | Netherlands | Yes | Yes | Yes | Yes |
apptek | AppTek | USA | Yes | - | Yes | Yes |
bbn | BBN Technologies | USA | Yes | - | Yes | Yes |
buaa | Beihang University, Institute of Intelligent Information Processing, School of Computer Science and Engineering | China | - | - | - | Yes |
cas-ia | Chinese Academy of Sciences, Institute of Automation | China | - | - | - | Yes |
cas-ict | Chinese Academy of Sciences, Institute of Computing Technology | China | - | - | - | Yes |
ccid | China Center for Information Industry Development | China | - | - | - | Yes |
cmu-ebmt | Carnegie Mellon EBMT | USA | - | Yes | - | - |
cmu-smt | Carnegie Mellon LTI interACT | USA | Yes | - | Yes | Yes |
cmu-statxfer | Carnegie Mellon StatXfer | USA | Yes | Yes | Yes | - |
columbia | Columbia University | USA | Yes | - | - | - |
cued | Cambridge University Engineering Department | UK | Yes | - | - | - |
dcu | Dublin City University | Ireland | - | - | - | Yes |
dfki | DFKI GmbH | Germany | - | - | - | Yes |
edinburgh | University of Edinburgh | UK | Yes | - | Yes | withdrew |
fbk | Fondazione Bruno Kessler | Italy | Yes | - | Yes | - |
frdc | Fujitsu Research & Development Center Co., Ltd. | China | - | - | - | Yes |
hit-ltrc | Harbin Institute of Technology, Language Technology Research Center | China | - | - | - | Yes |
hongkong | City University of Hong Kong | China | withdrew | Yes | - | - |
ibm | IBM | USA | Yes | - | withdrew | - |
jhu | Johns Hopkins University | USA | Yes | Yes | - | withdrew |
kcsl | KCSL Inc. | Canada | Yes | - | - | - |
limsi | Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRS | France | Yes | - | - | - |
lium | Université du Maine (Le Mans) | France | - | - | - | Yes |
nju-nlp | Nanjing University NLP | China | - | - | - | Yes |
nrc | National Research Council Canada | Canada | - | - | - | Yes |
nthu | National Tsing Hua University, Department of Computer Science | Taiwan | - | - | - | Yes |
rwth | RWTH-Aachen University, Chair of Computer Sciences | Germany | Yes | - | Yes | Yes |
sakhr | Sakhr Software | Egypt | Yes | - | Yes | - |
sri | SRI International | USA | Yes | - | Yes | Yes |
stanford | Stanford University | USA | Yes | - | withdrew | - |
systran | SYSTRAN Software Inc. | USA | - | Yes | - | - |
telaviv | Tel Aviv University | Israel | Yes | - | - | - |
tubitak-uekae | TUBITAK-UEKAE | Turkey | Yes | - | Yes | - |
umd | University of Maryland | USA | Yes | Yes | withdrew | Yes |
upc-lsi | UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics) | Spain | Yes | Yes | Yes | - |
Total (Individual Only) | 21 | 9 | 12 | 19 | ||
Collaborations | ||||||
lium-systran | Université du Maine (Le Mans) / SYSTRAN | . | Yes | - | Yes | - |
systran-lium | SYSTRAN / Université du Maine (Le Mans) | . | - | - | - | Yes |
systran-nrc | SYSTRAN / National Research Council Canada | . | - | - | - | Yes |
Total (Individual + Collaboration) | 23 | 10 | 14 | 22 | ||
Notes | ||||||
fsc | Fitchburg State College | USA | Submission not scored. |
[ Current Test Set, Arabic-to-English Results ] [ Current Test Set, Urdu-to-English Results ] [ Progress Test Set Results ] [ Informal System Combination Results ]
This release page is limited to the Current Test for the Arabic-to-English track.
Scores reported are limited to primary, on-time, non-debugged submissions.
Scores are ordered by BLEU-4 score on the Overall test set.
Results for submissions from GALE participants, who had previous access to the test data, will be reported separately from results submitted by those that are not part of the GALE program.
The following table lists the submissions received from the sites participating in the Arabic-to-English Current Test.
Site ID | Organization | Location | Current Test Set, Arabic-to-English | |
---|---|---|---|---|
Single System Track | System Combination Track | |||
amsterdam | University of Amsterdam | Netherlands | Yes(1) | - |
apptek | AppTek | USA | Yes | - |
bbn | BBN Technologies | USA | Yes | - |
cmu-smt | Carnegie Mellon LTI interACT | USA | Yes | - |
cmu-statxfer | Carnegie Mellon StatXfer | USA | Yes | - |
columbia | Columbia University | USA | Yes | - |
cued | Cambridge University Engineering Department | UK | Yes | - |
edinburgh | University of Edinburgh | UK | Yes | - |
fbk | Fondazione Bruno Kessler | Italy | Yes | - |
ibm | IBM | USA | Yes | Yes |
isi-lw | University of Southern California / Language Weaver Inc. | USA | Yes | Yes |
jhu | Johns Hopkins University | USA | Late and/or debugged submission | - |
kcsl | KCSL Inc. | Canada | Yes | - |
limsi | Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRS | France | Yes | - |
lium-systran | Université du Maine (Le Mans) / SYSTRAN | . | Yes(1) | - |
rwth | RWTH-Aachen University, Chair of Computer Sciences | Germany | Yes | - |
sakhr | Sakhr Software | Egypt | Yes | - |
sri | SRI International | USA | Yes | Yes |
stanford | Stanford University | USA | Yes | - |
telaviv | Tel Aviv University | Israel | Yes | - |
tubitak-uekae | TUBITAK-UEKAE | Turkey | Yes | - |
umd | University of Maryland | USA | Yes | - |
upc-lsi | UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics) | Spain | Yes | - |
(1)A late and/or debugged system was also submitted, not reported here.
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
Constrained Data Track | |||||||
UnConstrained Data Track | |||||||
cued | CUED_a2e_cn_primary | 0.4834 | 0.5641 | 0.3960 | |||
stanford | stanford_a2e_cn_primary | 0.4781 | 0.5673 | 0.3843 | |||
isi-lw | isi-lw_a2e_cn_primary | 0.4763 | 0.5590 | 0.3810 | |||
ibm | IBM_a2e_constrained_primary | 0.4708 | 0.5547 | 0.3833 | |||
bbn | BBN_a2e_cn_primary | 0.4680 | 0.5566 | 0.3783 | |||
rwth | RWTH_a2e_cn_primary | 0.4534 | 0.5402 | 0.3538 | |||
sri | SRI_a2e_cn_primary | 0.4527 | 0.5366 | 0.3634 | |||
edinburgh | Edinburgh_a2e_cn_primary | 0.4479 | 0.5240 | 0.3605 | |||
umd | UMD_a2e_cn_primary | 0.4409 | 0.5340 | 0.3415 | |||
cmu-smt | CMU-SMT_a2e_cn_primary | 0.4304 | 0.5055 | 0.3473 | |||
columbia | columbia_a2e_cn_primary | 0.4157 | 0.4932 | 0.3331 | |||
cmu-statxfer | CMU-Stat-Xfer_a2e_cn_primary | 0.3774 | 0.4448 | 0.2986 | |||
sakhr | SAKHR_a2e_cn_primary | 0.3681 | 0.4185 | 0.3147 | |||
ibm | IBM_arabic_un_primary | 0.4981 | 0.5713 | 0.4214 | |||
apptek | AppTek_a2e_un_primary | 0.4790 | 0.5165 | 0.4352 |
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
Constrained Data Track | |||||||
lium-systran | LIUM-SYSTRAN_a2e_cn_primary(1) | 0.4773 | 0.5629 | 0.3800 | |||
fbk | FBK_a2e_cn_primary | 0.4567 | 0.5418 | 0.3615 | |||
limsi | LIMSI_Moses_a2e_cn_primary | 0.4384 | 0.5242 | 0.3471 | |||
tubitak-uekae | TUBITAK_a2e_cn_primary | 0.4112 | 0.4826 | 0.3310 | |||
upc-lsi | UPC.LSI_a2e_cn_primary | 0.3588 | 0.4344 | 0.2778 | |||
amsterdam | UvA_a2e_cn_primary(1) | 0.3221 | 0.3820 | 0.2565 | |||
kcsl | KCSL_a2e_cn_primary | 0.1422 | 0.1670 | 0.1161 | |||
telaviv | TLVEBMT_a2e_cn_primary | 0.0703 | 0.0872 | 0.0527 |
(1)A late and/or debugged system was also submitted, not reported here.
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
Constrained Data Track | |||||||
UnConstrained Data Track | |||||||
isi-lw | isi-lw_a2e_cn_combo1 | 0.4802 | 0.5600 | 0.3914 | |||
ibm | IBM_a2e_cn_combo0 | 0.4775 | 0.5636 | 0.3871 | |||
sri | SRI_a2e_cn_combo1 | 0.4631 | 0.5472 | 0.3731 | |||
ibm | IBM_a2e_un_combo0 | 0.5096 | 0.5913 | 0.4241 |
Site ID | System | BLEU-4 (mteval-v13a) | IBM BLEU (bleu-1.04) | NIST (mteval-v13a) | TER (tercom-0.7.25) | METEOR (meteor-0.7) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | ||
Constrained Data Track | ||||||||||||||||
UnConstrained Data Track | ||||||||||||||||
cued | CUED_a2e_cn_primary | 0.4834 | 0.5641 | 0.3960 | 0.4833 | 0.5640 | 0.3959 | 11.01 | 11.31 | 9.603 | 0.4489 | 0.3861 | 0.5093 | 0.6570 | 0.7152 | 0.5999 |
isi-lw | isi-lw_a2e_cn_combo1 | 0.4802 | 0.5600 | 0.3914 | 0.4801 | 0.5598 | 0.3913 | 10.85 | 11.24 | 9.396 | 0.4643 | 0.3887 | 0.5371 | 0.6642 | 0.7319 | 0.5969 |
stanford | stanford_a2e_cn_primary | 0.4781 | 0.5673 | 0.3843 | 0.4777 | 0.5668 | 0.3840 | 10.97 | 11.44 | 9.392 | 0.4399 | 0.3709 | 0.5065 | 0.6514 | 0.7153 | 0.5882 |
ibm | IBM_a2e_cn_combo0 | 0.4775 | 0.5636 | 0.3871 | 0.4773 | 0.5634 | 0.3870 | 11.03 | 11.54 | 9.466 | 0.4394 | 0.3713 | 0.5051 | 0.6526 | 0.7142 | 0.5924 |
isi-lw | isi-lw_a2e_cn_primary | 0.4763 | 0.5590 | 0.3810 | 0.4760 | 0.5588 | 0.3808 | 10.85 | 11.30 | 9.292 | 0.4590 | 0.3826 | 0.5328 | 0.6544 | 0.7233 | 0.5864 |
ibm | IBM_a2e_constrained_primary | 0.4708 | 0.5547 | 0.3833 | 0.4707 | 0.5545 | 0.3831 | 10.97 | 11.46 | 9.406 | 0.4424 | 0.3773 | 0.5051 | 0.6478 | 0.7091 | 0.5876 |
bbn | BBN_a2e_cn_primary | 0.4680 | 0.5566 | 0.3783 | 0.4678 | 0.5564 | 0.3781 | 10.85 | 11.46 | 9.304 | 0.4603 | 0.3800 | 0.5378 | 0.6561 | 0.7125 | 0.6015 |
sri | SRI_a2e_cn_combo1 | 0.4631 | 0.5472 | 0.3731 | 0.4629 | 0.5470 | 0.3730 | 10.86 | 11.25 | 9.306 | 0.4592 | 0.3974 | 0.5189 | 0.6461 | 0.7063 | 0.5866 |
rwth | RWTH_a2e_cn_primary | 0.4534 | 0.5402 | 0.3538 | 0.4533 | 0.5400 | 0.3537 | 10.65 | 11.12 | 9.028 | 0.4666 | 0.3980 | 0.5328 | 0.6532 | 0.7154 | 0.5918 |
sri | SRI_a2e_cn_primary | 0.4527 | 0.5366 | 0.3634 | 0.4526 | 0.5365 | 0.3633 | 10.71 | 11.11 | 9.251 | 0.4689 | 0.4003 | 0.5351 | 0.6459 | 0.7071 | 0.5859 |
edinburgh | Edinburgh_a2e_cn_primary | 0.4479 | 0.5240 | 0.3605 | 0.4478 | 0.5238 | 0.3604 | 10.58 | 10.92 | 9.174 | 0.4857 | 0.4209 | 0.5482 | 0.6454 | 0.7069 | 0.5847 |
umd | UMD_a2e_cn_primary | 0.4409 | 0.5340 | 0.3415 | 0.4408 | 0.5338 | 0.3413 | 10.53 | 11.25 | 8.590 | 0.4591 | 0.3909 | 0.5248 | 0.6287 | 0.6982 | 0.5595 |
cmu-smt | CMU-SMT_a2e_cn_primary | 0.4304 | 0.5055 | 0.3473 | 0.4302 | 0.5053 | 0.3469 | 10.33 | 10.72 | 8.896 | 0.4823 | 0.4182 | 0.5442 | 0.6365 | 0.6988 | 0.5748 |
columbia | columbia_a2e_cn_primary | 0.4157 | 0.4932 | 0.3331 | 0.4156 | 0.4931 | 0.3330 | 10.27 | 10.73 | 8.785 | 0.4747 | 0.4160 | 0.5313 | 0.6269 | 0.6847 | 0.5698 |
cmu-statxfer | CMU-Stat-Xfer_a2e_cn_primary | 0.3774 | 0.4448 | 0.2986 | 0.3772 | 0.4447 | 0.2984 | 9.731 | 10.07 | 8.399 | 0.5158 | 0.4628 | 0.5669 | 0.6082 | 0.6684 | 0.5484 |
sakhr | SAKHR_a2e_cn_primary | 0.3681 | 0.4185 | 0.3147 | 0.3680 | 0.4184 | 0.3147 | 9.867 | 9.982 | 8.887 | 0.5075 | 0.4626 | 0.5508 | 0.6320 | 0.6737 | 0.5913 |
ibm | IBM_a2e_un_combo0 | 0.5096 | 0.5913 | 0.4241 | 0.5095 | 0.5912 | 0.4240 | 11.54 | 11.92 | 10.02 | 0.4168 | 0.3507 | 0.4804 | 0.6768 | 0.7366 | 0.6181 |
ibm | IBM_arabic_un_primary | 0.4981 | 0.5713 | 0.4214 | 0.4979 | 0.5711 | 0.4214 | 11.41 | 11.70 | 9.973 | 0.4253 | 0.3653 | 0.4833 | 0.6700 | 0.7293 | 0.6117 |
apptek | AppTek_a2e_un_primary | 0.4790 | 0.5165 | 0.4352 | 0.4787 | 0.5162 | 0.4348 | 11.21 | 10.98 | 10.32 | 0.4342 | 0.3969 | 0.4702 | 0.6818 | 0.7207 | 0.6433 |
Site ID | System | BLEU-4 (mteval-v13a) | IBM BLEU (bleu-1.04) | NIST (mteval-v13a) | TER (tercom-0.7.25) | METEOR (meteor-0.7) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | ||
Constrained Data Track | ||||||||||||||||
lium-systran | LIUM-SYSTRAN_a2e_cn_primary(1) | 0.4773 | 0.5629 | 0.3800 | 0.4772 | 0.5627 | 0.3799 | 10.96 | 11.38 | 9.412 | 0.4565 | 0.3851 | 0.5252 | 0.6526 | 0.7185 | 0.5879 |
fbk | FBK_a2e_cn_primary | 0.4567 | 0.5418 | 0.3615 | 0.4565 | 0.5417 | 0.3613 | 10.75 | 11.22 | 9.252 | 0.4721 | 0.3952 | 0.5462 | 0.6533 | 0.7170 | 0.5910 |
limsi | LIMSI_Moses_a2e_cn_primary | 0.4384 | 0.5242 | 0.3471 | 0.4383 | 0.5240 | 0.3469 | 10.40 | 10.98 | 8.738 | 0.4724 | 0.4051 | 0.5373 | 0.6212 | 0.6851 | 0.5582 |
tubitak-uekae | TUBITAK_a2e_cn_primary | 0.4112 | 0.4826 | 0.3310 | 0.4121 | 0.4827 | 0.3312 | 10.01 | 10.43 | 8.659 | 0.5129 | 0.4445 | 0.5789 | 0.6259 | 0.6890 | 0.5627 |
upc-lsi | UPC.LSI_a2e_cn_primary | 0.3588 | 0.4344 | 0.2778 | 0.3588 | 0.4345 | 0.2777 | 9.404 | 10.05 | 7.655 | 0.5188 | 0.4630 | 0.5725 | 0.5866 | 0.6515 | 0.5216 |
amsterdam | UvA_a2e_cn_primary(1) | 0.3221 | 0.3820 | 0.2565 | 0.3218 | 0.3816 | 0.2563 | 8.621 | 9.063 | 7.458 | 0.5995 | 0.5437 | 0.6533 | 0.5863 | 0.6457 | 0.5278 |
kcsl | KCSL_a2e_cn_primary | 0.1422 | 0.1670 | 0.1161 | 0.1420 | 0.1669 | 0.1158 | 6.647 | 6.959 | 5.820 | 0.6590 | 0.6353 | 0.6818 | 0.4937 | 0.5398 | 0.4482 |
telaviv | TLVEBMT_a2e_cn_primary | 0.0703 | 0.0872 | 0.0527 | 0.0703 | 0.0872 | 0.0526 | 3.879 | 4.370 | 3.165 | 0.7483 | 0.7274 | 0.7685 | 0.3853 | 0.4262 | 0.3450 |
(1)A late and/or debugged system was also submitted, not reported here.
This release page is limited to the Current Test for the Urdu-to-English track.
Scores reported are limited to primary, on-time, non-debugged submissions.
Scores are ordered by BLEU-4 score on the Overall test set.
The following table lists the submissions received from the sites participating in the Urdu-to-English Current Test.
Site ID | Organization | Location | Current Test Set, Urdu-to-English | |
---|---|---|---|---|
Single System Track | System Combination Track | |||
afrl | Air Force Research Laboratory | USA | Yes | - |
amsterdam | University of Amsterdam | Netherlands | Late and/or debugged submission | - |
cmu-ebmt | Carnegie Mellon EBMT | USA | Yes(1) | - |
cmu-statxfer | Carnegie Mellon StatXfer | USA | Yes | Yes |
hongkong | City University of Hong Kong | China | Yes | - |
isi-lw | University of Southern California / Language Weaver Inc. | USA | Yes | Yes |
jhu | Johns Hopkins University | USA | Late and/or debugged submission | - |
systran | SYSTRAN Software Inc. | USA | Yes | - |
umd | University of Maryland | USA | Yes | - |
upc-lsi | UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics) | Spain | Yes | - |
(1)A late and/or debugged system was also submitted, not reported here.
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
Constrained Data Track | |||||||
UnConstrained Data Track | |||||||
isi-lw | isi-lw_u2e_cn_primary | 0.3120 | 0.3753 | 0.2440 | |||
umd | UMD_u2e_cn_primary | 0.2395 | 0.2697 | 0.2056 | |||
cmu-statxfer | CMU-Stat-Xfer_u2e_cn_primary | 0.2322 | 0.2614 | 0.2013 | |||
afrl | AFRL_u2e_cn_primary | 0.2235 | 0.2526 | 0.1904 | |||
upc-lsi | UPC.LSI_u2e_cn_primary | 0.1971 | 0.2325 | 0.1610 | |||
cmu-ebmt | cmu_cunei_u2e_cn_primary(1) | 0.1863 | 0.2122 | 0.1568 | |||
hongkong | CityU_u2e_cn_primary | 0.0254 | 0.0291 | 0.0215 | |||
systran | SYSTRAN_u2e_un_primary | 0.2555 | 0.3034 | 0.2051 |
(1)A late and/or debugged system was also submitted, not reported here.
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
Constrained Data Track | |||||||
isi-lw | isi-lw_u2e_cn_combo1 | 0.3187 | 0.3816 | 0.2509 | |||
cmu-statxfer | CMU-Stat-Xfer_u2e_cn_combo1 | 0.2504 | 0.2910 | 0.2082 |
Site ID | System | BLEU-4 (mteval-v13a) | IBM BLEU (bleu-1.04) | NIST (mteval-v13a) | TER (tercom-0.7.25) | METEOR (meteor-0.7) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | ||
Constrained Data Track | ||||||||||||||||
UnConstrained Data Track | ||||||||||||||||
isi-lw | isi-lw_u2e_cn_combo1 | 0.3187 | 0.3816 | 0.2509 | 0.3187 | 0.3815 | 0.2509 | 8.948 | 9.465 | 7.540 | 0.5866 | 0.5495 | 0.6199 | 0.5616 | 0.6237 | 0.5027 |
isi-lw | isi-lw_u2e_cn_primary | 0.3120 | 0.3753 | 0.2440 | 0.3119 | 0.3752 | 0.2440 | 8.836 | 9.379 | 7.413 | 0.5845 | 0.5495 | 0.6159 | 0.5539 | 0.6160 | 0.4951 |
cmu-statxfer | CMU-Stat-Xfer_u2e_cn_combo1 | 0.2504 | 0.2910 | 0.2082 | 0.2504 | 0.2907 | 0.2084 | 8.244 | 8.550 | 7.218 | 0.6273 | 0.6128 | 0.6404 | 0.5380 | 0.5838 | 0.4947 |
umd | UMD_u2e_cn_primary | 0.2395 | 0.2697 | 0.2056 | 0.2394 | 0.2696 | 0.2056 | 7.871 | 8.089 | 7.051 | 0.6519 | 0.6436 | 0.6593 | 0.5355 | 0.5815 | 0.4919 |
cmu-statxfer | CMU-Stat-Xfer_u2e_cn_primary | 0.2322 | 0.2614 | 0.2013 | 0.2323 | 0.2612 | 0.2017 | 7.743 | 7.919 | 7.029 | 0.6960 | 0.6926 | 0.6991 | 0.5445 | 0.5907 | 0.5006 |
afrl | AFRL_u2e_cn_primary | 0.2235 | 0.2526 | 0.1904 | 0.2238 | 0.2526 | 0.1907 | 7.957 | 8.123 | 7.157 | 0.6537 | 0.6483 | 0.6584 | 0.5396 | 0.5815 | 0.4996 |
upc-lsi | UPC.LSI_u2e_cn_primary | 0.1971 | 0.2325 | 0.1610 | 0.1972 | 0.2324 | 0.1612 | 7.510 | 7.945 | 6.361 | 0.6607 | 0.6526 | 0.6680 | 0.5015 | 0.5530 | 0.4524 |
cmu-ebmt | cmu_cunei_u2e_cn_primary(1) | 0.1863 | 0.2122 | 0.1568 | 0.1863 | 0.2121 | 0.1568 | 6.811 | 6.899 | 6.065 | 0.6977 | 0.7001 | 0.6956 | 0.5274 | 0.5801 | 0.4773 |
hongkong | CityU_u2e_cn_primary | 0.0254 | 0.0291 | 0.0215 | 0.0256 | 0.0293 | 0.0218 | 2.647 | 2.788 | 2.430 | 1.1169 | 1.1565 | 1.0813 | 0.2929 | 0.3196 | 0.2676 |
systran | SYSTRAN_u2e_un_primary | 0.2555 | 0.3034 | 0.2051 | 0.2553 | 0.3032 | 0.2050 | 8.203 | 8.622 | 7.090 | 0.6310 | 0.6060 | 0.6534 | 0.5401 | 0.5913 | 0.4921 |
(1)A late and/or debugged system was also submitted, not reported here.
This page of results is limited to those of the Progress Test data for the MT09 tests of Arabic-to-English and Chinese-to-English text-to-text translation.
The Progress test was designed as a means to demonstrate true system improvement for a particular site over time, irrespective of inherent data differences that come with each new test set. In order to keep the focus on progress over time, only results from sites that participated in both OpenMT08 and OpenMT09 are reported.
There may be issues related to processing older test sets that make across-site comparisons less meaningful.
Scores reported are limited to primary, on-time, non-debugged submissions.
Scores are ordered by BLEU-4 score on the Overall test set.
Note that all BLEU-4 scores reported on this page were computed using version 13a of the mteval script. Last year, scores were computed using mteval-v11b, which uses a different brevity penalty. This year's script (mteval-v13a) uses the 'closest reference translation length' (same as in IBM BLEU) instead of the 'shortest reference translation length', to compute the brevity penalty.
The following table lists the submissions received from the sites participating in the Progress Test. Cells in bold indicate submissions for which scores are reported:
Site ID | Organization | Location | Progress Test Set | |
---|---|---|---|---|
Arabic-to-English | Chinese-to-English | |||
amsterdam | University of Amsterdam | Netherlands | MT09 only | MT09 only |
apptek | AppTek | USA | MT08 and MT09 | Different training condition |
bbn | BBN Technologies | USA | MT08 and MT09 | MT08 and MT09 |
buaa | Beihang University, Institute of Intelligent Information Processing, School of Computer Science and Engineering | China | - | MT09 only |
cas-ia | Chinese Academy of Sciences, Institute of Automation | China | - | MT09 only |
cas-ict | Chinese Academy of Sciences, Institute of Computing Technology | China | - | MT09 only |
ccid | China Center for Information Industry Development | China | - | MT09 only |
cmu-smt | Carnegie Mellon LTI interACT | USA | Different training condition | Different training condition |
cmu-statxfer | Carnegie Mellon StatXfer | USA | MT09 only | - |
dcu | Dublin City University | Ireland | - | MT09 only |
dfki | DFKI GmbH | Germany | - | MT09 only |
edinburgh | University of Edinburgh | UK | MT09 only | - |
fbk | Fondazione Bruno Kessler | Italy | MT09 only | - |
frdc | Fujitsu Research & Development Center Co., Ltd. | China | - | MT09 only |
hit-ltrc | Harbin Institute of Technology, Language Technology Research Center | China | - | MT09 only |
isi-lw | University of Southern California / Language Weaver Inc. | USA | MT08 and MT09 | MT08 and MT09 |
lium | Université du Maine (Le Mans) | France | - | MT09 only |
lium-systran | Université du Maine (Le Mans) / SYSTRAN | . | Late and/or debugged submission | - |
nju-nlp | Nanjing University NLP | China | - | MT09 only |
nrc | National Research Council Canada | Canada | - | MT08 and MT09 |
nthu | National Tsing Hua University, Department of Computer Science | Taiwan | - | MT09 only |
rwth | RWTH-Aachen University, Chair of Computer Sciences | Germany | MT09 only | MT09 only |
sakhr | Sakhr Software | Egypt | MT09 only | - |
sri | SRI International | USA | MT08 and MT09 | MT08 and MT09 |
systran-lium | SYSTRAN / Université du Maine (Le Mans) | . | - | MT09 only |
systran-nrc | SYSTRAN / National Research Council Canada | . | - | MT09 only |
tubitak-uekae | TUBITAK-UEKAE | Turkey | MT09 only | - |
umd | University of Maryland | USA | - | MT08 and MT09 |
upc-lsi | UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics) | Spain | MT08 and MT09 | - |
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
MT08 | MT09 | MT08 | MT09 | MT08 | MT09 | ||
Constrained Data Track | |||||||
UnConstrained Data Track | |||||||
bbn | BBN-progress_a2e_cn_primary | 0.4186(1) | 0.4379 | 0.4655(1) | 0.4926 | 0.3566(1) | 0.3678 |
isi-lw | isi-lw_a2e_cn_primary | 0.4030(1) | 0.4296 | 0.4498(1) | 0.4748 | 0.3408(1) | 0.3621 |
sri | SRI_a2e_cn_primary | 0.4011(1) | 0.4087 | 0.4558(1) | 0.4499 | 0.3277(1) | 0.3551 |
upc-lsi | UPC.LSI_a2e_cn_primary | 0.2956 | 0.3303 | 0.3448 | 0.3769 | 0.2300 | 0.2676 |
apptek | AppTek_a2e_un_primary | 0.4195 | 0.3955 | 0.4245 | 0.3991 | 0.4131 | 0.3909 |
(1)The MT08 system was a combined system
Site ID | System | BLEU-4 (mteval-v13a) | |||||
---|---|---|---|---|---|---|---|
Overall | Newswire | Web | |||||
MT08 | MT09 | MT08 | MT09 | MT08 | MT09 | ||
Constrained Data Track | |||||||
isi-lw | isi-lw_c2e_cn_primary | 0.2990(1) | 0.3225 | 0.3516(1) | 0.3628 | 0.2237(1) | 0.2642 |
bbn | BBN-progress_c2e_cn_primary | 0.3055(1) | 0.3153 | 0.3447(1) | 0.3481 | 0.2509(1) | 0.2697 |
nrc | NRC_c2e_cn_primary | 0.2480 | 0.2811 | 0.2679 | 0.3130 | 0.2204 | 0.2357 |
sri | SRI_c2e_cn_primary | 0.2617(1) | 0.2790 | 0.3028(1) | 0.3132 | 0.2032(1) | 0.2303 |
umd | UMD_c2e_cn_primary | 0.2456 | 0.2500 | 0.2823 | 0.2781 | 0.1936 | 0.2108 |
(1)The MT08 system was a combined system
The NIST 2009 Open Machine Translation Evaluation (MT09) is part of an ongoing series of evaluations of human language translation technology. NIST conducts these evaluations in order to support machine translation (MT) research and help advance the state-of-the-art in machine translation technology. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. The evaluation was administered as outlined in the official MT09 evaluation plan.
Informal System Combination was an informal, diagnostic MT09 task, offered after the official evaluation period. Output from several MT09 systems on the Arabic-to-English and Urdu-to-English Current tests was anonymized and provided for system combination purposes. Participants in this category produced new output based on those provided translations.
Scores reported here are limited to primary Informal System Combination submissions.
These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers of commercial MT products were generally from research systems, not commercially available products. Since MT09 was an evaluation of research algorithms, the MT09 test design required local implementation by each participant. As such, participants were only required to submit their translation system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.
Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.
There is ongoing discussion within the MT research community regarding the most informative metrics for machine translation. The design and implementation of these metrics are themselves very much part of the research. At the present time, there is no single metric that has been deemed to be completely indicative of all aspects of system performance.
The data, protocols, and metrics employed in this evaluation were chosen to support MT research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.
Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.
System output for the Informal System Combination track included output of the Arabic-to-English and Urdu-to-English Current tests. Approximately 30% of the test data was designated as a development set for system combination. The remainder of the system output was provided as the test set.
Language Pair | Data Genre | Development Set | Evaluation Set |
---|---|---|---|
Arabic-to-English | Newswire | 17 documents | 42 documents |
Web | 16 documents | 40 documents | |
Urdu-to-English | Newswire | 20 documents | 48 documents |
Web | 48 documents | 114 documents |
Site ID | System | BLEU-4 (mteval-v13a) | IBM BLEU (bleu-1.04) | NIST (mteval-v13a) | TER (tercom-0.7.25) | METEOR (meteor-0.7) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | ||
Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set) | ||||||||||||||||
bbn | BBN_a2e_isc_primary | 0.5747 | 0.6440 | 0.4940 | 0.5747 | 0.6440 | 0.4938 | 11.82 | 11.84 | 10.41 | 0.3761 | 0.3220 | 0.4298 | 0.7043 | 0.7601 | 0.6469 |
sri | SRI_a2e_isc_primary | 0.5543 | 0.6292 | 0.4733 | 0.5542 | 0.6291 | 0.4732 | 11.68 | 11.79 | 10.26 | 0.3788 | 0.3244 | 0.4328 | 0.6989 | 0.7474 | 0.6493 |
cmu-statxfer | CMU-Stat-Xfer_a2e_isc_primary | 0.5530 | 0.6332 | 0.4663 | 0.5529 | 0.6330 | 0.4662 | 11.62 | 11.80 | 10.15 | 0.3854 | 0.3279 | 0.4427 | 0.7033 | 0.7518 | 0.6538 |
rwth | RWTH_a2e_isc_primary | 0.5515 | 0.6412 | 0.4523 | 0.5517 | 0.6411 | 0.4523 | 11.56 | 11.86 | 9.879 | 0.3923 | 0.3229 | 0.4613 | 0.6928 | 0.7568 | 0.6272 |
jhu | jhu_a2e_isc_primary | 0.5483 | 0.6294 | 0.4577 | 0.5481 | 0.6291 | 0.4574 | 11.55 | 11.73 | 10.01 | 0.3862 | 0.3272 | 0.4448 | 0.6919 | 0.7494 | 0.6330 |
hit-ltrc | HIT-LTRC_a2e_isc_primary | 0.5037 | 0.5997 | 0.3982 | 0.5038 | 0.6000 | 0.3981 | 10.65 | 11.48 | 8.406 | 0.4135 | 0.3472 | 0.4793 | 0.6596 | 0.7249 | 0.5922 |
tubitak-uekae | TUBITAK_a2e_isc_primary | 0.4603 | 0.5371 | 0.3779 | 0.4603 | 0.5371 | 0.3779 | 10.31 | 10.75 | 8.726 | 0.4525 | 0.3942 | 0.5105 | 0.6263 | 0.6882 | 0.5625 |
system08_unconstrained.xml | 0.5008 | 0.5719 | 0.4245 | 0.5007 | 0.5720 | 0.4243 | 11.04 | 11.28 | 9.598 | 0.4229 | 0.3641 | 0.4813 | 0.6694 | 0.7271 | 0.6104 |
Site ID | System | BLEU-4 (mteval-v13a) | IBM BLEU (bleu-1.04) | NIST (mteval-v13a) | TER (tercom-0.7.25) | METEOR (meteor-0.7) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | Overall | Newswire | Web | ||
Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set) | ||||||||||||||||
rwth | RWTH_u2e_isc_primary(1) | 0.3232 | 0.3768 | 0.2737 | 0.3235 | 0.3767 | 0.2740 | 8.822 | 9.274 | 7.425 | 0.5630 | 0.5383 | 0.5833 | 0.5539 | 0.6105 | 0.5046 |
jhu | jhu_u2e_isc_primary | 0.3193 | 0.3796 | 0.2627 | 0.3191 | 0.3792 | 0.2627 | 8.736 | 9.197 | 7.418 | 0.5590 | 0.5317 | 0.5815 | 0.5512 | 0.6073 | 0.5022 |
cmu-statxfer | CMU-Stat-Xfer_u2e_isc_primary | 0.3188 | 0.3821 | 0.2602 | 0.3188 | 0.3821 | 0.2602 | 8.694 | 9.154 | 7.353 | 0.5741 | 0.5422 | 0.6004 | 0.5560 | 0.6170 | 0.5030 |
hit-ltrc | HIT-LTRC_u2e_isc_primary | 0.3103 | 0.3774 | 0.2453 | 0.3104 | 0.3773 | 0.2455 | 8.639 | 9.195 | 7.271 | 0.5820 | 0.5416 | 0.6152 | 0.5519 | 0.6184 | 0.4941 |
system09_constrained.xml | 0.3104 | 0.3774 | 0.2456 | 0.3104 | 0.3773 | 0.2456 | 8.640 | 9.196 | 7.276 | 0.5816 | 0.5414 | 0.6146 | 0.5522 | 0.6186 | 0.4945 |
(1)rescored