Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NIST 2009 Open Machine Translation Evaluation (MT09) Official Release of Results

Date of release: Tue Oct 27 15:48:58 2009
Version: mt09_public_v1

The NIST 2009 Open Machine Translation Evaluation (MT09) is part of an ongoing series of evaluations of human language translation technology. NIST conducts these evaluations in order to support machine translation (MT) research and help advance the state-of-the-art in machine translation technology. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. The evaluation was administered as outlined in the official MT09 evaluation plan.

Disclaimer

These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers of commercial MT products were generally from research systems, not commercially available products. Since MT09 was an evaluation of research algorithms, the MT09 test design required local implementation by each participant. As such, participants were only required to submit their translation system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.
 

Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.
 

There is ongoing discussion within the MT research community regarding the most informative metrics for machine translation. The design and implementation of these metrics are themselves very much part of the research. At the present time, there is no single metric that has been deemed to be completely indicative of all aspects of system performance.
 

The data, protocols, and metrics employed in this evaluation were chosen to support MT research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.
 

Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.

History

  • 2009/10/27 : First public release

Evaluation Tasks

MT09 was a test of text-to-text MT technology. The evaluation consisted of three tasks, differing only by the source language processed:

  • Arabic-to-English
  • Chinese-to-English
  • Urdu-to-English

Evaluation Conditions

MT research and development requires language data resources. System performance is strongly affected by the type and amount of resources used. Therefore, two different resource categories were defined as conditions of evaluation. The categories differ solely by the amount of data that was available for use in the training and development of the core MT engine. These evaluation conditions were called "Constrained Training" and "Unconstrained Training". See the evaluation specification document for a complete description of allowable resources for each.

Evaluation Tracks

In recent years, performance improvements have been demonstrated through the use of system combination techniques. For MT09, two evaluation tracks were supported that were called "Single System Track" and "System Combination Track". Results are reported separately for each track. As the names of each track implies, a key feature of systems entered in the Single System Track is that the resulting translations are produced by primarily one algorithmic approach, while translations from the System Combination Track result from a combination technique where 2 or more core algorithmic approaches are used.

Evaluation Data

The following table contains the approximate source word count, for each language pair and data genre, separately for the Current Test Set and the Progress Test Set. For the Chinese-to-English language pair, we consider that a Chinese word is 1.5 characters, on average.

Language Pair

Data Genre

Current Test Set

Progress Test Set

Arabic-to-EnglishNewswire16K words (68 documents)20K words (81 documents)
Web15K words (67 documents)15K words (51 documents)
Chinese-to-EnglishNewswire 20K words (82 documents)
Web 15K words (40 documents)
Urdu-to-EnglishNewswire24K words (72 documents) 
Web21K words (166 documents) 

Performance Measurement

  • BLEU-4 (mteval-v13a, the official MT09 evaluation metric)
    • Invocation line: perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -b
    • Option -c : case-sensitive scoring
    • Option -b : BLEU score only
    • Version 13a of the script is a bug-fixed version that prevents a division-by-zero error to occur when a candidate translation segment is empty.
  • IBM BLEU (bleu-1.04a)
    • Invocation line: perl bleu-1.04.pl -r REFERENCE_FILE -t CANDIDATE_FILE
    • By default, scoring is case-sensitive
    • The NormalizeText method was updated to properly un-escape XML entities.
  • NIST (mteval-v13a)
    • Invocation line: perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -n
    • Option -c : case-sensitive scoring
    • Option -n : NIST score only
  • TER (tercom-0.7.25)
    • Invocation line: java -jar tercom.7.25.jar -r REFERENCE_FILE -h CANDIDATE_FILE -N -s
    • Option -N : enables normalization
    • Option -s : case-sensitive scoring
  • METEOR (meteor-0.7)
    • Invocation line: perl meteor.pl -s SYSTEM_ID -r REFERENCE_FILE -t CANDIDATE_FILE --modules "exact porter_stem wn_stem wn_synonymy"
    • Option --modules "exact porter_stem wn_stem wn_synonymy" : uses all four METEOR matching modules, in that order

Participants

The following table lists the organizations participating in MT09 and the test sets they registered to process.

Site ID

Organization

Location

Current Test Set

Progress Test Set

Arabic-to-English

Urdu-to-English

Arabic-to-English

Chinese-to-English

afrlAir Force Research LaboratoryUSA-Yes--
amsterdamUniversity of AmsterdamNetherlandsYesYesYesYes
apptekAppTekUSAYes-YesYes
bbnBBN TechnologiesUSAYes-YesYes
buaaBeihang University, Institute of Intelligent Information Processing, School of Computer Science and EngineeringChina---Yes
cas-iaChinese Academy of Sciences, Institute of AutomationChina---Yes
cas-ictChinese Academy of Sciences, Institute of Computing TechnologyChina---Yes
ccidChina Center for Information Industry DevelopmentChina---Yes
cmu-ebmtCarnegie Mellon EBMTUSA-Yes--
cmu-smtCarnegie Mellon LTI interACTUSAYes-YesYes
cmu-statxferCarnegie Mellon StatXferUSAYesYesYes-
columbiaColumbia UniversityUSAYes---
cuedCambridge University Engineering DepartmentUKYes---
dcuDublin City UniversityIreland---Yes
dfkiDFKI GmbHGermany---Yes
edinburghUniversity of EdinburghUKYes-Yeswithdrew
fbkFondazione Bruno KesslerItalyYes-Yes-
frdcFujitsu Research & Development Center Co., Ltd.China---Yes
hit-ltrcHarbin Institute of Technology, Language Technology Research CenterChina---Yes
hongkongCity University of Hong KongChinawithdrewYes--
ibmIBMUSAYes-withdrew-
jhuJohns Hopkins UniversityUSAYesYes-withdrew
kcslKCSL Inc.CanadaYes---
limsiLaboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRSFranceYes---
liumUniversité du Maine (Le Mans)France---Yes
nju-nlpNanjing University NLPChina---Yes
nrcNational Research Council CanadaCanada---Yes
nthuNational Tsing Hua University, Department of Computer ScienceTaiwan---Yes
rwthRWTH-Aachen University, Chair of Computer SciencesGermanyYes-YesYes
sakhrSakhr SoftwareEgyptYes-Yes-
sriSRI InternationalUSAYes-YesYes
stanfordStanford UniversityUSAYes-withdrew-
systranSYSTRAN Software Inc.USA-Yes--
telavivTel Aviv UniversityIsraelYes---
tubitak-uekaeTUBITAK-UEKAETurkeyYes-Yes-
umdUniversity of MarylandUSAYesYeswithdrewYes
upc-lsiUPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)SpainYesYesYes-

Total (Individual Only)

2191219

Collaborations

lium-systranUniversité du Maine (Le Mans) / SYSTRAN.Yes-Yes-
systran-liumSYSTRAN / Université du Maine (Le Mans).---Yes
systran-nrcSYSTRAN / National Research Council Canada.---Yes

Total (Individual + Collaboration)

23101422

Notes

fscFitchburg State CollegeUSASubmission not scored.

Results Section

Current Test Set, Arabic-to-English Results ]   [ Current Test Set, Urdu-to-English Results ]   [ Progress Test Set Results ]  [ Informal System Combination Results ]

 

Current Test Set, Arabic-to-English Results

Introduction

This release page is limited to the Current Test for the Arabic-to-English track.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Results for submissions from GALE participants, who had previous access to the test data, will be reported separately from results submitted by those that are not part of the GALE program.

Participants

The following table lists the submissions received from the sites participating in the Arabic-to-English Current Test.

Site IDOrganizationLocationCurrent Test Set, Arabic-to-English
Single System TrackSystem Combination Track
amsterdamUniversity of AmsterdamNetherlandsYes(1)-
apptekAppTekUSAYes-
bbnBBN TechnologiesUSAYes-
cmu-smtCarnegie Mellon LTI interACTUSAYes-
cmu-statxferCarnegie Mellon StatXferUSAYes-
columbiaColumbia UniversityUSAYes-
cuedCambridge University Engineering DepartmentUKYes-
edinburghUniversity of EdinburghUKYes-
fbkFondazione Bruno KesslerItalyYes-
ibmIBMUSAYesYes
isi-lwUniversity of Southern California / Language Weaver Inc.USAYesYes
jhuJohns Hopkins UniversityUSALate and/or debugged submission-
kcslKCSL Inc.CanadaYes-
limsiLaboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRSFranceYes-
lium-systranUniversité du Maine (Le Mans) / SYSTRAN.Yes(1)-
rwthRWTH-Aachen University, Chair of Computer SciencesGermanyYes-
sakhrSakhr SoftwareEgyptYes-
sriSRI InternationalUSAYesYes
stanfordStanford UniversityUSAYes-
telavivTel Aviv UniversityIsraelYes-
tubitak-uekaeTUBITAK-UEKAETurkeyYes-
umdUniversity of MarylandUSAYes-
upc-lsiUPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)SpainYes-

(1)A late and/or debugged system was also submitted, not reported here.

Current Test Results - Single System Track

Arabic-to-English, submissions from participants involved in GALE (Table 1)

Site IDSystem

BLEU-4 (mteval-v13a)

   
OverallNewswireWeb   

Constrained Data Track

UnConstrained Data Track

cuedCUED_a2e_cn_primary0.48340.56410.3960   
stanfordstanford_a2e_cn_primary0.47810.56730.3843   
isi-lwisi-lw_a2e_cn_primary0.47630.55900.3810   
ibmIBM_a2e_constrained_primary0.47080.55470.3833   
bbnBBN_a2e_cn_primary0.46800.55660.3783   
rwthRWTH_a2e_cn_primary0.45340.54020.3538   
sriSRI_a2e_cn_primary0.45270.53660.3634   
edinburghEdinburgh_a2e_cn_primary0.44790.52400.3605   
umdUMD_a2e_cn_primary0.44090.53400.3415   
cmu-smtCMU-SMT_a2e_cn_primary0.43040.50550.3473   
columbiacolumbia_a2e_cn_primary0.41570.49320.3331   
cmu-statxferCMU-Stat-Xfer_a2e_cn_primary0.37740.44480.2986   
sakhrSAKHR_a2e_cn_primary0.36810.41850.3147   
ibmIBM_arabic_un_primary0.49810.57130.4214   
apptekAppTek_a2e_un_primary0.47900.51650.4352   

 

Arabic-to-English, submissions from participants not involved in GALE (Table 2)

Site IDSystemBLEU-4 (mteval-v13a)   
OverallNewswireWeb   

Constrained Data Track

lium-systranLIUM-SYSTRAN_a2e_cn_primary(1)0.47730.56290.3800   
fbkFBK_a2e_cn_primary0.45670.54180.3615   
limsiLIMSI_Moses_a2e_cn_primary0.43840.52420.3471   
tubitak-uekaeTUBITAK_a2e_cn_primary0.41120.48260.3310   
upc-lsiUPC.LSI_a2e_cn_primary0.35880.43440.2778   
amsterdamUvA_a2e_cn_primary(1)0.32210.38200.2565   
kcslKCSL_a2e_cn_primary0.14220.16700.1161   
telavivTLVEBMT_a2e_cn_primary0.07030.08720.0527   

(1)A late and/or debugged system was also submitted, not reported here.
 

Current Test Results - System Combination Track

Arabic-to-English, submissions from participants involved in GALE (Table 3)

Site IDSystem

BLEU-4 (mteval-v13a)

   
OverallNewswireWeb   
Constrained Data Track
UnConstrained Data Track
isi-lwisi-lw_a2e_cn_combo10.48020.56000.3914   
ibmIBM_a2e_cn_combo00.47750.56360.3871   
sriSRI_a2e_cn_combo10.46310.54720.3731   
ibmIBM_a2e_un_combo00.50960.59130.4241   

 

Current Test Results - Single System Track and System Combination Track - All metrics

Arabic-to-English, submissions from participants involved in GALE (Table 4)

Site IDSystemBLEU-4 (mteval-v13a)IBM BLEU (bleu-1.04)NIST (mteval-v13a)TER (tercom-0.7.25)METEOR (meteor-0.7)
OverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWeb

Constrained Data Track

UnConstrained Data Track

cuedCUED_a2e_cn_primary0.48340.56410.39600.48330.56400.395911.0111.319.6030.44890.38610.50930.65700.71520.5999
isi-lwisi-lw_a2e_cn_combo10.48020.56000.39140.48010.55980.391310.8511.249.3960.46430.38870.53710.66420.73190.5969
stanfordstanford_a2e_cn_primary0.47810.56730.38430.47770.56680.384010.9711.449.3920.43990.37090.50650.65140.71530.5882
ibmIBM_a2e_cn_combo00.47750.56360.38710.47730.56340.387011.0311.549.4660.43940.37130.50510.65260.71420.5924
isi-lwisi-lw_a2e_cn_primary0.47630.55900.38100.47600.55880.380810.8511.309.2920.45900.38260.53280.65440.72330.5864
ibmIBM_a2e_constrained_primary0.47080.55470.38330.47070.55450.383110.9711.469.4060.44240.37730.50510.64780.70910.5876
bbnBBN_a2e_cn_primary0.46800.55660.37830.46780.55640.378110.8511.469.3040.46030.38000.53780.65610.71250.6015
sriSRI_a2e_cn_combo10.46310.54720.37310.46290.54700.373010.8611.259.3060.45920.39740.51890.64610.70630.5866
rwthRWTH_a2e_cn_primary0.45340.54020.35380.45330.54000.353710.6511.129.0280.46660.39800.53280.65320.71540.5918
sriSRI_a2e_cn_primary0.45270.53660.36340.45260.53650.363310.7111.119.2510.46890.40030.53510.64590.70710.5859
edinburghEdinburgh_a2e_cn_primary0.44790.52400.36050.44780.52380.360410.5810.929.1740.48570.42090.54820.64540.70690.5847
umdUMD_a2e_cn_primary0.44090.53400.34150.44080.53380.341310.5311.258.5900.45910.39090.52480.62870.69820.5595
cmu-smtCMU-SMT_a2e_cn_primary0.43040.50550.34730.43020.50530.346910.3310.728.8960.48230.41820.54420.63650.69880.5748
columbiacolumbia_a2e_cn_primary0.41570.49320.33310.41560.49310.333010.2710.738.7850.47470.41600.53130.62690.68470.5698
cmu-statxferCMU-Stat-Xfer_a2e_cn_primary0.37740.44480.29860.37720.44470.29849.73110.078.3990.51580.46280.56690.60820.66840.5484
sakhrSAKHR_a2e_cn_primary0.36810.41850.31470.36800.41840.31479.8679.9828.8870.50750.46260.55080.63200.67370.5913
ibmIBM_a2e_un_combo00.50960.59130.42410.50950.59120.424011.5411.9210.020.41680.35070.48040.67680.73660.6181
ibmIBM_arabic_un_primary0.49810.57130.42140.49790.57110.421411.4111.709.9730.42530.36530.48330.67000.72930.6117
apptekAppTek_a2e_un_primary0.47900.51650.43520.47870.51620.434811.2110.9810.320.43420.39690.47020.68180.72070.6433

 

Arabic-to-English, submissions from participants not involved in GALE (Table 5)

Site IDSystemBLEU-4 (mteval-v13a)IBM BLEU (bleu-1.04)NIST (mteval-v13a)TER (tercom-0.7.25)METEOR (meteor-0.7)
OverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWeb

Constrained Data Track

lium-systranLIUM-SYSTRAN_a2e_cn_primary(1)0.47730.56290.38000.47720.56270.379910.9611.389.4120.45650.38510.52520.65260.71850.5879
fbkFBK_a2e_cn_primary0.45670.54180.36150.45650.54170.361310.7511.229.2520.47210.39520.54620.65330.71700.5910
limsiLIMSI_Moses_a2e_cn_primary0.43840.52420.34710.43830.52400.346910.4010.988.7380.47240.40510.53730.62120.68510.5582
tubitak-uekaeTUBITAK_a2e_cn_primary0.41120.48260.33100.41210.48270.331210.0110.438.6590.51290.44450.57890.62590.68900.5627
upc-lsiUPC.LSI_a2e_cn_primary0.35880.43440.27780.35880.43450.27779.40410.057.6550.51880.46300.57250.58660.65150.5216
amsterdamUvA_a2e_cn_primary(1)0.32210.38200.25650.32180.38160.25638.6219.0637.4580.59950.54370.65330.58630.64570.5278
kcslKCSL_a2e_cn_primary0.14220.16700.11610.14200.16690.11586.6476.9595.8200.65900.63530.68180.49370.53980.4482
telavivTLVEBMT_a2e_cn_primary0.07030.08720.05270.07030.08720.05263.8794.3703.1650.74830.72740.76850.38530.42620.3450

(1)A late and/or debugged system was also submitted, not reported here.

 

Current Test Set, Urdu-to-English Results

Introduction

This release page is limited to the Current Test for the Urdu-to-English track.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Participants

The following table lists the submissions received from the sites participating in the Urdu-to-English Current Test.

Site IDOrganizationLocationCurrent Test Set, Urdu-to-English
Single System TrackSystem Combination Track
afrlAir Force Research LaboratoryUSAYes-
amsterdamUniversity of AmsterdamNetherlandsLate and/or debugged submission-
cmu-ebmtCarnegie Mellon EBMTUSAYes(1)-
cmu-statxferCarnegie Mellon StatXferUSAYesYes
hongkongCity University of Hong KongChinaYes-
isi-lwUniversity of Southern California / Language Weaver Inc.USAYesYes
jhuJohns Hopkins UniversityUSALate and/or debugged submission-
systranSYSTRAN Software Inc.USAYes-
umdUniversity of MarylandUSAYes-
upc-lsiUPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)SpainYes-

(1)A late and/or debugged system was also submitted, not reported here.

Current Test Results - Single System Track

Urdu-to-English (Table 1)

Site IDSystem

BLEU-4 (mteval-v13a)

   
OverallNewswireWeb   

Constrained Data Track

UnConstrained Data Track

isi-lwisi-lw_u2e_cn_primary0.31200.37530.2440   
umdUMD_u2e_cn_primary0.23950.26970.2056   
cmu-statxferCMU-Stat-Xfer_u2e_cn_primary0.23220.26140.2013   
afrlAFRL_u2e_cn_primary0.22350.25260.1904   
upc-lsiUPC.LSI_u2e_cn_primary0.19710.23250.1610   
cmu-ebmtcmu_cunei_u2e_cn_primary(1)0.18630.21220.1568   
hongkongCityU_u2e_cn_primary0.02540.02910.0215   
systranSYSTRAN_u2e_un_primary0.25550.30340.2051   

(1)A late and/or debugged system was also submitted, not reported here.
 

Current Test Results - System Combination Track

Urdu-to-English (Table 2)

Site IDSystem

BLEU-4 (mteval-v13a)

   
OverallNewswireWeb   

Constrained Data Track

isi-lwisi-lw_u2e_cn_combo10.31870.38160.2509   
cmu-statxferCMU-Stat-Xfer_u2e_cn_combo10.25040.29100.2082   

 

Current Test Results - Single System Track and System Combination Track - All metrics

Urdu-to-English (Table 3)

Site IDSystemBLEU-4 (mteval-v13a)IBM BLEU (bleu-1.04)NIST (mteval-v13a)TER (tercom-0.7.25)METEOR (meteor-0.7)
OverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWeb

Constrained Data Track

UnConstrained Data Track

isi-lwisi-lw_u2e_cn_combo10.31870.38160.25090.31870.38150.25098.9489.4657.5400.58660.54950.61990.56160.62370.5027
isi-lwisi-lw_u2e_cn_primary0.31200.37530.24400.31190.37520.24408.8369.3797.4130.58450.54950.61590.55390.61600.4951
cmu-statxferCMU-Stat-Xfer_u2e_cn_combo10.25040.29100.20820.25040.29070.20848.2448.5507.2180.62730.61280.64040.53800.58380.4947
umdUMD_u2e_cn_primary0.23950.26970.20560.23940.26960.20567.8718.0897.0510.65190.64360.65930.53550.58150.4919
cmu-statxferCMU-Stat-Xfer_u2e_cn_primary0.23220.26140.20130.23230.26120.20177.7437.9197.0290.69600.69260.69910.54450.59070.5006
afrlAFRL_u2e_cn_primary0.22350.25260.19040.22380.25260.19077.9578.1237.1570.65370.64830.65840.53960.58150.4996
upc-lsiUPC.LSI_u2e_cn_primary0.19710.23250.16100.19720.23240.16127.5107.9456.3610.66070.65260.66800.50150.55300.4524
cmu-ebmtcmu_cunei_u2e_cn_primary(1)0.18630.21220.15680.18630.21210.15686.8116.8996.0650.69770.70010.69560.52740.58010.4773
hongkongCityU_u2e_cn_primary0.02540.02910.02150.02560.02930.02182.6472.7882.4301.11691.15651.08130.29290.31960.2676
systranSYSTRAN_u2e_un_primary0.25550.30340.20510.25530.30320.20508.2038.6227.0900.63100.60600.65340.54010.59130.4921

(1)A late and/or debugged system was also submitted, not reported here.

 

Progress Test Set Results

Introduction

This page of results is limited to those of the Progress Test data for the MT09 tests of Arabic-to-English and Chinese-to-English text-to-text translation.

The Progress test was designed as a means to demonstrate true system improvement for a particular site over time, irrespective of inherent data differences that come with each new test set. In order to keep the focus on progress over time, only results from sites that participated in both OpenMT08 and OpenMT09 are reported.

There may be issues related to processing older test sets that make across-site comparisons less meaningful.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Note that all BLEU-4 scores reported on this page were computed using version 13a of the mteval script. Last year, scores were computed using mteval-v11b, which uses a different brevity penalty. This year's script (mteval-v13a) uses the 'closest reference translation length' (same as in IBM BLEU) instead of the 'shortest reference translation length', to compute the brevity penalty.

Participants

The following table lists the submissions received from the sites participating in the Progress Test. Cells in bold indicate submissions for which scores are reported:

  • Submissions were valid and on-time
  • The site also participated in MT08
  • The MT08 system was submitted for the same training condition as the OpenMT 2009 system.
Site IDOrganizationLocation

Progress Test Set

Arabic-to-EnglishChinese-to-English
amsterdamUniversity of AmsterdamNetherlandsMT09 onlyMT09 only
apptekAppTekUSAMT08 and MT09Different training condition
bbnBBN TechnologiesUSAMT08 and MT09MT08 and MT09
buaaBeihang University, Institute of Intelligent Information Processing, School of Computer Science and EngineeringChina-MT09 only
cas-iaChinese Academy of Sciences, Institute of AutomationChina-MT09 only
cas-ictChinese Academy of Sciences, Institute of Computing TechnologyChina-MT09 only
ccidChina Center for Information Industry DevelopmentChina-MT09 only
cmu-smtCarnegie Mellon LTI interACTUSADifferent training conditionDifferent training condition
cmu-statxferCarnegie Mellon StatXferUSAMT09 only-
dcuDublin City UniversityIreland-MT09 only
dfkiDFKI GmbHGermany-MT09 only
edinburghUniversity of EdinburghUKMT09 only-
fbkFondazione Bruno KesslerItalyMT09 only-
frdcFujitsu Research & Development Center Co., Ltd.China-MT09 only
hit-ltrcHarbin Institute of Technology, Language Technology Research CenterChina-MT09 only
isi-lwUniversity of Southern California / Language Weaver Inc.USAMT08 and MT09MT08 and MT09
liumUniversité du Maine (Le Mans)France-MT09 only
lium-systranUniversité du Maine (Le Mans) / SYSTRAN.Late and/or debugged submission-
nju-nlpNanjing University NLPChina-MT09 only
nrcNational Research Council CanadaCanada-MT08 and MT09
nthuNational Tsing Hua University, Department of Computer ScienceTaiwan-MT09 only
rwthRWTH-Aachen University, Chair of Computer SciencesGermanyMT09 onlyMT09 only
sakhrSakhr SoftwareEgyptMT09 only-
sriSRI InternationalUSAMT08 and MT09MT08 and MT09
systran-liumSYSTRAN / Université du Maine (Le Mans).-MT09 only
systran-nrcSYSTRAN / National Research Council Canada.-MT09 only
tubitak-uekaeTUBITAK-UEKAETurkeyMT09 only-
umdUniversity of MarylandUSA-MT08 and MT09
upc-lsiUPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)SpainMT08 and MT09-

Progress Test Results - Single System Track

Arabic-to-English (Table 1)

Site IDSystem

BLEU-4 (mteval-v13a)

OverallNewswireWeb
MT08MT09MT08MT09MT08MT09

Constrained Data Track

UnConstrained Data Track

bbnBBN-progress_a2e_cn_primary0.4186(1)0.43790.4655(1)0.49260.3566(1)0.3678
isi-lwisi-lw_a2e_cn_primary0.4030(1)0.42960.4498(1)0.47480.3408(1)0.3621
sriSRI_a2e_cn_primary0.4011(1)0.40870.4558(1)0.44990.3277(1)0.3551
upc-lsiUPC.LSI_a2e_cn_primary0.29560.33030.34480.37690.23000.2676
apptekAppTek_a2e_un_primary0.41950.39550.42450.39910.41310.3909

(1)The MT08 system was a combined system
 

Chinese-to-English (Table 2)

Site IDSystem

BLEU-4 (mteval-v13a)

OverallNewswireWeb
MT08MT09MT08MT09MT08MT09

Constrained Data Track

isi-lwisi-lw_c2e_cn_primary0.2990(1)0.32250.3516(1)0.36280.2237(1)0.2642
bbnBBN-progress_c2e_cn_primary0.3055(1)0.31530.3447(1)0.34810.2509(1)0.2697
nrcNRC_c2e_cn_primary0.24800.28110.26790.31300.22040.2357
sriSRI_c2e_cn_primary0.2617(1)0.27900.3028(1)0.31320.2032(1)0.2303
umdUMD_c2e_cn_primary0.24560.25000.28230.27810.19360.2108

(1)The MT08 system was a combined system

 

Informal System Combination Results

Introduction

The NIST 2009 Open Machine Translation Evaluation (MT09) is part of an ongoing series of evaluations of human language translation technology. NIST conducts these evaluations in order to support machine translation (MT) research and help advance the state-of-the-art in machine translation technology. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. The evaluation was administered as outlined in the official MT09 evaluation plan.

Informal System Combination was an informal, diagnostic MT09 task, offered after the official evaluation period. Output from several MT09 systems on the Arabic-to-English and Urdu-to-English Current tests was anonymized and provided for system combination purposes. Participants in this category produced new output based on those provided translations.

Scores reported here are limited to primary Informal System Combination submissions.

Disclaimer

These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers of commercial MT products were generally from research systems, not commercially available products. Since MT09 was an evaluation of research algorithms, the MT09 test design required local implementation by each participant. As such, participants were only required to submit their translation system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.
 

Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.
 

There is ongoing discussion within the MT research community regarding the most informative metrics for machine translation. The design and implementation of these metrics are themselves very much part of the research. At the present time, there is no single metric that has been deemed to be completely indicative of all aspects of system performance.
 

The data, protocols, and metrics employed in this evaluation were chosen to support MT research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.
 

Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.

History

  • 2009/10/27 : First public release

Evaluation Data

System output for the Informal System Combination track included output of the Arabic-to-English and Urdu-to-English Current tests. Approximately 30% of the test data was designated as a development set for system combination. The remainder of the system output was provided as the test set.

Language PairData GenreDevelopment SetEvaluation Set
Arabic-to-EnglishNewswire17 documents42 documents
Web16 documents40 documents
Urdu-to-EnglishNewswire20 documents48 documents
Web48 documents114 documents

Informal System Combination Results

Arabic-to-English (Table 1)

Site IDSystemBLEU-4 (mteval-v13a)IBM BLEU (bleu-1.04)NIST (mteval-v13a)TER (tercom-0.7.25)METEOR (meteor-0.7)
OverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWeb

Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set)

bbnBBN_a2e_isc_primary0.57470.64400.49400.57470.64400.493811.8211.8410.410.37610.32200.42980.70430.76010.6469
sriSRI_a2e_isc_primary0.55430.62920.47330.55420.62910.473211.6811.7910.260.37880.32440.43280.69890.74740.6493
cmu-statxferCMU-Stat-Xfer_a2e_isc_primary0.55300.63320.46630.55290.63300.466211.6211.8010.150.38540.32790.44270.70330.75180.6538
rwthRWTH_a2e_isc_primary0.55150.64120.45230.55170.64110.452311.5611.869.8790.39230.32290.46130.69280.75680.6272
jhujhu_a2e_isc_primary0.54830.62940.45770.54810.62910.457411.5511.7310.010.38620.32720.44480.69190.74940.6330
hit-ltrcHIT-LTRC_a2e_isc_primary0.50370.59970.39820.50380.60000.398110.6511.488.4060.41350.34720.47930.65960.72490.5922
tubitak-uekaeTUBITAK_a2e_isc_primary0.46030.53710.37790.46030.53710.377910.3110.758.7260.45250.39420.51050.62630.68820.5625
system08_unconstrained.xml0.50080.57190.42450.50070.57200.424311.0411.289.5980.42290.36410.48130.66940.72710.6104

 

Urdu-to-English (Table 2)

Site IDSystemBLEU-4 (mteval-v13a)IBM BLEU (bleu-1.04)NIST (mteval-v13a)TER (tercom-0.7.25)METEOR (meteor-0.7)
OverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWebOverallNewswireWeb

Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set)

rwthRWTH_u2e_isc_primary(1)0.32320.37680.27370.32350.37670.27408.8229.2747.4250.56300.53830.58330.55390.61050.5046
jhujhu_u2e_isc_primary0.31930.37960.26270.31910.37920.26278.7369.1977.4180.55900.53170.58150.55120.60730.5022
cmu-statxferCMU-Stat-Xfer_u2e_isc_primary0.31880.38210.26020.31880.38210.26028.6949.1547.3530.57410.54220.60040.55600.61700.5030
hit-ltrcHIT-LTRC_u2e_isc_primary0.31030.37740.24530.31040.37730.24558.6399.1957.2710.58200.54160.61520.55190.61840.4941
system09_constrained.xml0.31040.37740.24560.31040.37730.24568.6409.1967.2760.58160.54140.61460.55220.61860.4945

(1)rescored

Created August 27, 2024, Updated August 30, 2024