Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / Information Access Division

Multimodal Information Group

NIST 2009 Open Machine Translation Evaluation (MT09) Official Release of Results

Date of release: Tue Oct 27 15:48:58 2009
Version: mt09_public_v1

The NIST 2009 Open Machine Translation Evaluation (MT09) is part of an ongoing series of evaluations of human language translation technology. NIST conducts these evaluations in order to support machine translation (MT) research and help advance the state-of-the-art in machine translation technology. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. The evaluation was administered as outlined in the official MT09 evaluation plan.

Disclaimer

These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers of commercial MT products were generally from research systems, not commercially available products. Since MT09 was an evaluation of research algorithms, the MT09 test design required local implementation by each participant. As such, participants were only required to submit their translation system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.

Certain commercial equipment, instruments, software, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.

There is ongoing discussion within the MT research community regarding the most informative metrics for machine translation. The design and implementation of these metrics are themselves very much part of the research. At the present time, there is no single metric that has been deemed to be completely indicative of all aspects of system performance.

The data, protocols, and metrics employed in this evaluation were chosen to support MT research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.

Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.

History

2009/10/27 : First public release

Evaluation Tasks

MT09 was a test of text-to-text MT technology. The evaluation consisted of three tasks, differing only by the source language processed:

Arabic-to-English
Chinese-to-English
Urdu-to-English

Evaluation Conditions

MT research and development requires language data resources. System performance is strongly affected by the type and amount of resources used. Therefore, two different resource categories were defined as conditions of evaluation. The categories differ solely by the amount of data that was available for use in the training and development of the core MT engine. These evaluation conditions were called "Constrained Training" and "Unconstrained Training". See the evaluation specification document for a complete description of allowable resources for each.

Evaluation Tracks

In recent years, performance improvements have been demonstrated through the use of system combination techniques. For MT09, two evaluation tracks were supported that were called "Single System Track" and "System Combination Track". Results are reported separately for each track. As the names of each track implies, a key feature of systems entered in the Single System Track is that the resulting translations are produced by primarily one algorithmic approach, while translations from the System Combination Track result from a combination technique where 2 or more core algorithmic approaches are used.

Evaluation Data

The following table contains the approximate source word count, for each language pair and data genre, separately for the Current Test Set and the Progress Test Set. For the Chinese-to-English language pair, we consider that a Chinese word is 1.5 characters, on average.

Language Pair	Data Genre	Current Test Set	Progress Test Set
Arabic-to-English	Newswire	16K words (68 documents)	20K words (81 documents)
Arabic-to-English	Web	15K words (67 documents)	15K words (51 documents)
Chinese-to-English	Newswire		20K words (82 documents)
Chinese-to-English	Web		15K words (40 documents)
Urdu-to-English	Newswire	24K words (72 documents)
Urdu-to-English	Web	21K words (166 documents)

Performance Measurement

BLEU-4 (mteval-v13a, the official MT09 evaluation metric)
- Invocation line: perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -b
- Option -c : case-sensitive scoring
- Option -b : BLEU score only
- Version 13a of the script is a bug-fixed version that prevents a division-by-zero error to occur when a candidate translation segment is empty.
IBM BLEU (bleu-1.04a)
- Invocation line: perl bleu-1.04.pl -r REFERENCE_FILE -t CANDIDATE_FILE
- By default, scoring is case-sensitive
- The NormalizeText method was updated to properly un-escape XML entities.
NIST (mteval-v13a)
- Invocation line: perl mteval-v13a.pl -r REFERENCE_FILE -s SOURCE_FILE -t CANDIDATE_FILE -c -n
- Option -c : case-sensitive scoring
- Option -n : NIST score only
TER (tercom-0.7.25)
- Invocation line: java -jar tercom.7.25.jar -r REFERENCE_FILE -h CANDIDATE_FILE -N -s
- Option -N : enables normalization
- Option -s : case-sensitive scoring
METEOR (meteor-0.7)
- Invocation line: perl meteor.pl -s SYSTEM_ID -r REFERENCE_FILE -t CANDIDATE_FILE --modules "exact porter_stem wn_stem wn_synonymy"
- Option --modules "exact porter_stem wn_stem wn_synonymy" : uses all four METEOR matching modules, in that order

Participants

The following table lists the organizations participating in MT09 and the test sets they registered to process.

Site ID	Organization	Location	Current Test Set		Progress Test Set
Site ID	Organization	Location	Arabic-to-English	Urdu-to-English	Arabic-to-English	Chinese-to-English
afrl	Air Force Research Laboratory	USA	-	Yes	-	-
amsterdam	University of Amsterdam	Netherlands	Yes	Yes	Yes	Yes
apptek	AppTek	USA	Yes	-	Yes	Yes
bbn	BBN Technologies	USA	Yes	-	Yes	Yes
buaa	Beihang University, Institute of Intelligent Information Processing, School of Computer Science and Engineering	China	-	-	-	Yes
cas-ia	Chinese Academy of Sciences, Institute of Automation	China	-	-	-	Yes
cas-ict	Chinese Academy of Sciences, Institute of Computing Technology	China	-	-	-	Yes
ccid	China Center for Information Industry Development	China	-	-	-	Yes
cmu-ebmt	Carnegie Mellon EBMT	USA	-	Yes	-	-
cmu-smt	Carnegie Mellon LTI interACT	USA	Yes	-	Yes	Yes
cmu-statxfer	Carnegie Mellon StatXfer	USA	Yes	Yes	Yes	-
columbia	Columbia University	USA	Yes	-	-	-
cued	Cambridge University Engineering Department	UK	Yes	-	-	-
dcu	Dublin City University	Ireland	-	-	-	Yes
dfki	DFKI GmbH	Germany	-	-	-	Yes
edinburgh	University of Edinburgh	UK	Yes	-	Yes	withdrew
fbk	Fondazione Bruno Kessler	Italy	Yes	-	Yes	-
frdc	Fujitsu Research & Development Center Co., Ltd.	China	-	-	-	Yes
hit-ltrc	Harbin Institute of Technology, Language Technology Research Center	China	-	-	-	Yes
hongkong	City University of Hong Kong	China	withdrew	Yes	-	-
ibm	IBM	USA	Yes	-	withdrew	-
jhu	Johns Hopkins University	USA	Yes	Yes	-	withdrew
kcsl	KCSL Inc.	Canada	Yes	-	-	-
limsi	Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRS	France	Yes	-	-	-
lium	Université du Maine (Le Mans)	France	-	-	-	Yes
nju-nlp	Nanjing University NLP	China	-	-	-	Yes
nrc	National Research Council Canada	Canada	-	-	-	Yes
nthu	National Tsing Hua University, Department of Computer Science	Taiwan	-	-	-	Yes
rwth	RWTH-Aachen University, Chair of Computer Sciences	Germany	Yes	-	Yes	Yes
sakhr	Sakhr Software	Egypt	Yes	-	Yes	-
sri	SRI International	USA	Yes	-	Yes	Yes
stanford	Stanford University	USA	Yes	-	withdrew	-
systran	SYSTRAN Software Inc.	USA	-	Yes	-	-
telaviv	Tel Aviv University	Israel	Yes	-	-	-
tubitak-uekae	TUBITAK-UEKAE	Turkey	Yes	-	Yes	-
umd	University of Maryland	USA	Yes	Yes	withdrew	Yes
upc-lsi	UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)	Spain	Yes	Yes	Yes	-
Total (Individual Only)			21	9	12	19
Collaborations
lium-systran	Université du Maine (Le Mans) / SYSTRAN	.	Yes	-	Yes	-
systran-lium	SYSTRAN / Université du Maine (Le Mans)	.	-	-	-	Yes
systran-nrc	SYSTRAN / National Research Council Canada	.	-	-	-	Yes
Total (Individual + Collaboration)			23	10	14	22
Notes
fsc	Fitchburg State College	USA	Submission not scored.

Results Section

[ Current Test Set, Arabic-to-English Results ] [ Current Test Set, Urdu-to-English Results ] [ Progress Test Set Results ] [ Informal System Combination Results ]

Current Test Set, Arabic-to-English Results

Introduction

This release page is limited to the Current Test for the Arabic-to-English track.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Results for submissions from GALE participants, who had previous access to the test data, will be reported separately from results submitted by those that are not part of the GALE program.

Participants

The following table lists the submissions received from the sites participating in the Arabic-to-English Current Test.

Site ID	Organization	Location	Current Test Set, Arabic-to-English
Site ID	Organization	Location	Single System Track	System Combination Track
amsterdam	University of Amsterdam	Netherlands	Yes⁽¹⁾	-
apptek	AppTek	USA	Yes	-
bbn	BBN Technologies	USA	Yes	-
cmu-smt	Carnegie Mellon LTI interACT	USA	Yes	-
cmu-statxfer	Carnegie Mellon StatXfer	USA	Yes	-
columbia	Columbia University	USA	Yes	-
cued	Cambridge University Engineering Department	UK	Yes	-
edinburgh	University of Edinburgh	UK	Yes	-
fbk	Fondazione Bruno Kessler	Italy	Yes	-
ibm	IBM	USA	Yes	Yes
isi-lw	University of Southern California / Language Weaver Inc.	USA	Yes	Yes
jhu	Johns Hopkins University	USA	Late and/or debugged submission	-
kcsl	KCSL Inc.	Canada	Yes	-
limsi	Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur - CNRS	France	Yes	-
lium-systran	Université du Maine (Le Mans) / SYSTRAN	.	Yes⁽¹⁾	-
rwth	RWTH-Aachen University, Chair of Computer Sciences	Germany	Yes	-
sakhr	Sakhr Software	Egypt	Yes	-
sri	SRI International	USA	Yes	Yes
stanford	Stanford University	USA	Yes	-
telaviv	Tel Aviv University	Israel	Yes	-
tubitak-uekae	TUBITAK-UEKAE	Turkey	Yes	-
umd	University of Maryland	USA	Yes	-
upc-lsi	UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)	Spain	Yes	-

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Current Test Results - Single System Track

Arabic-to-English, submissions from participants involved in GALE (Table 1)

Site ID	System	BLEU-4 (mteval-v13a)
Site ID	System	Overall	Newswire	Web
Constrained Data Track
UnConstrained Data Track
cued	CUED_a2e_cn_primary	0.4834	0.5641	0.3960
stanford	stanford_a2e_cn_primary	0.4781	0.5673	0.3843
isi-lw	isi-lw_a2e_cn_primary	0.4763	0.5590	0.3810
ibm	IBM_a2e_constrained_primary	0.4708	0.5547	0.3833
bbn	BBN_a2e_cn_primary	0.4680	0.5566	0.3783
rwth	RWTH_a2e_cn_primary	0.4534	0.5402	0.3538
sri	SRI_a2e_cn_primary	0.4527	0.5366	0.3634
edinburgh	Edinburgh_a2e_cn_primary	0.4479	0.5240	0.3605
umd	UMD_a2e_cn_primary	0.4409	0.5340	0.3415
cmu-smt	CMU-SMT_a2e_cn_primary	0.4304	0.5055	0.3473
columbia	columbia_a2e_cn_primary	0.4157	0.4932	0.3331
cmu-statxfer	CMU-Stat-Xfer_a2e_cn_primary	0.3774	0.4448	0.2986
sakhr	SAKHR_a2e_cn_primary	0.3681	0.4185	0.3147
ibm	IBM_arabic_un_primary	0.4981	0.5713	0.4214
apptek	AppTek_a2e_un_primary	0.4790	0.5165	0.4352

Arabic-to-English, submissions from participants not involved in GALE (Table 2)

Site ID	System	BLEU-4 (mteval-v13a)
Site ID	System	Overall	Newswire	Web
Constrained Data Track
lium-systran	LIUM-SYSTRAN_a2e_cn_primary⁽¹⁾	0.4773	0.5629	0.3800
fbk	FBK_a2e_cn_primary	0.4567	0.5418	0.3615
limsi	LIMSI_Moses_a2e_cn_primary	0.4384	0.5242	0.3471
tubitak-uekae	TUBITAK_a2e_cn_primary	0.4112	0.4826	0.3310
upc-lsi	UPC.LSI_a2e_cn_primary	0.3588	0.4344	0.2778
amsterdam	UvA_a2e_cn_primary⁽¹⁾	0.3221	0.3820	0.2565
kcsl	KCSL_a2e_cn_primary	0.1422	0.1670	0.1161
telaviv	TLVEBMT_a2e_cn_primary	0.0703	0.0872	0.0527

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Current Test Results - System Combination Track

Arabic-to-English, submissions from participants involved in GALE (Table 3)

Site ID	System	BLEU-4 (mteval-v13a)
Site ID	System	Overall	Newswire	Web
Constrained Data Track
UnConstrained Data Track
isi-lw	isi-lw_a2e_cn_combo1	0.4802	0.5600	0.3914
ibm	IBM_a2e_cn_combo0	0.4775	0.5636	0.3871
sri	SRI_a2e_cn_combo1	0.4631	0.5472	0.3731
ibm	IBM_a2e_un_combo0	0.5096	0.5913	0.4241

Current Test Results - Single System Track and System Combination Track - All metrics

Arabic-to-English, submissions from participants involved in GALE (Table 4)

Site ID	System	BLEU-4 (mteval-v13a)			IBM BLEU (bleu-1.04)			NIST (mteval-v13a)			TER (tercom-0.7.25)			METEOR (meteor-0.7)
Site ID	System	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web
Constrained Data Track
UnConstrained Data Track
cued	CUED_a2e_cn_primary	0.4834	0.5641	0.3960	0.4833	0.5640	0.3959	11.01	11.31	9.603	0.4489	0.3861	0.5093	0.6570	0.7152	0.5999
isi-lw	isi-lw_a2e_cn_combo1	0.4802	0.5600	0.3914	0.4801	0.5598	0.3913	10.85	11.24	9.396	0.4643	0.3887	0.5371	0.6642	0.7319	0.5969
stanford	stanford_a2e_cn_primary	0.4781	0.5673	0.3843	0.4777	0.5668	0.3840	10.97	11.44	9.392	0.4399	0.3709	0.5065	0.6514	0.7153	0.5882
ibm	IBM_a2e_cn_combo0	0.4775	0.5636	0.3871	0.4773	0.5634	0.3870	11.03	11.54	9.466	0.4394	0.3713	0.5051	0.6526	0.7142	0.5924
isi-lw	isi-lw_a2e_cn_primary	0.4763	0.5590	0.3810	0.4760	0.5588	0.3808	10.85	11.30	9.292	0.4590	0.3826	0.5328	0.6544	0.7233	0.5864
ibm	IBM_a2e_constrained_primary	0.4708	0.5547	0.3833	0.4707	0.5545	0.3831	10.97	11.46	9.406	0.4424	0.3773	0.5051	0.6478	0.7091	0.5876
bbn	BBN_a2e_cn_primary	0.4680	0.5566	0.3783	0.4678	0.5564	0.3781	10.85	11.46	9.304	0.4603	0.3800	0.5378	0.6561	0.7125	0.6015
sri	SRI_a2e_cn_combo1	0.4631	0.5472	0.3731	0.4629	0.5470	0.3730	10.86	11.25	9.306	0.4592	0.3974	0.5189	0.6461	0.7063	0.5866
rwth	RWTH_a2e_cn_primary	0.4534	0.5402	0.3538	0.4533	0.5400	0.3537	10.65	11.12	9.028	0.4666	0.3980	0.5328	0.6532	0.7154	0.5918
sri	SRI_a2e_cn_primary	0.4527	0.5366	0.3634	0.4526	0.5365	0.3633	10.71	11.11	9.251	0.4689	0.4003	0.5351	0.6459	0.7071	0.5859
edinburgh	Edinburgh_a2e_cn_primary	0.4479	0.5240	0.3605	0.4478	0.5238	0.3604	10.58	10.92	9.174	0.4857	0.4209	0.5482	0.6454	0.7069	0.5847
umd	UMD_a2e_cn_primary	0.4409	0.5340	0.3415	0.4408	0.5338	0.3413	10.53	11.25	8.590	0.4591	0.3909	0.5248	0.6287	0.6982	0.5595
cmu-smt	CMU-SMT_a2e_cn_primary	0.4304	0.5055	0.3473	0.4302	0.5053	0.3469	10.33	10.72	8.896	0.4823	0.4182	0.5442	0.6365	0.6988	0.5748
columbia	columbia_a2e_cn_primary	0.4157	0.4932	0.3331	0.4156	0.4931	0.3330	10.27	10.73	8.785	0.4747	0.4160	0.5313	0.6269	0.6847	0.5698
cmu-statxfer	CMU-Stat-Xfer_a2e_cn_primary	0.3774	0.4448	0.2986	0.3772	0.4447	0.2984	9.731	10.07	8.399	0.5158	0.4628	0.5669	0.6082	0.6684	0.5484
sakhr	SAKHR_a2e_cn_primary	0.3681	0.4185	0.3147	0.3680	0.4184	0.3147	9.867	9.982	8.887	0.5075	0.4626	0.5508	0.6320	0.6737	0.5913
ibm	IBM_a2e_un_combo0	0.5096	0.5913	0.4241	0.5095	0.5912	0.4240	11.54	11.92	10.02	0.4168	0.3507	0.4804	0.6768	0.7366	0.6181
ibm	IBM_arabic_un_primary	0.4981	0.5713	0.4214	0.4979	0.5711	0.4214	11.41	11.70	9.973	0.4253	0.3653	0.4833	0.6700	0.7293	0.6117
apptek	AppTek_a2e_un_primary	0.4790	0.5165	0.4352	0.4787	0.5162	0.4348	11.21	10.98	10.32	0.4342	0.3969	0.4702	0.6818	0.7207	0.6433

Arabic-to-English, submissions from participants not involved in GALE (Table 5)

Site ID	System	BLEU-4 (mteval-v13a)			IBM BLEU (bleu-1.04)			NIST (mteval-v13a)			TER (tercom-0.7.25)			METEOR (meteor-0.7)
Site ID	System	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web
Constrained Data Track
lium-systran	LIUM-SYSTRAN_a2e_cn_primary⁽¹⁾	0.4773	0.5629	0.3800	0.4772	0.5627	0.3799	10.96	11.38	9.412	0.4565	0.3851	0.5252	0.6526	0.7185	0.5879
fbk	FBK_a2e_cn_primary	0.4567	0.5418	0.3615	0.4565	0.5417	0.3613	10.75	11.22	9.252	0.4721	0.3952	0.5462	0.6533	0.7170	0.5910
limsi	LIMSI_Moses_a2e_cn_primary	0.4384	0.5242	0.3471	0.4383	0.5240	0.3469	10.40	10.98	8.738	0.4724	0.4051	0.5373	0.6212	0.6851	0.5582
tubitak-uekae	TUBITAK_a2e_cn_primary	0.4112	0.4826	0.3310	0.4121	0.4827	0.3312	10.01	10.43	8.659	0.5129	0.4445	0.5789	0.6259	0.6890	0.5627
upc-lsi	UPC.LSI_a2e_cn_primary	0.3588	0.4344	0.2778	0.3588	0.4345	0.2777	9.404	10.05	7.655	0.5188	0.4630	0.5725	0.5866	0.6515	0.5216
amsterdam	UvA_a2e_cn_primary⁽¹⁾	0.3221	0.3820	0.2565	0.3218	0.3816	0.2563	8.621	9.063	7.458	0.5995	0.5437	0.6533	0.5863	0.6457	0.5278
kcsl	KCSL_a2e_cn_primary	0.1422	0.1670	0.1161	0.1420	0.1669	0.1158	6.647	6.959	5.820	0.6590	0.6353	0.6818	0.4937	0.5398	0.4482
telaviv	TLVEBMT_a2e_cn_primary	0.0703	0.0872	0.0527	0.0703	0.0872	0.0526	3.879	4.370	3.165	0.7483	0.7274	0.7685	0.3853	0.4262	0.3450

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Current Test Set, Urdu-to-English Results

Introduction

This release page is limited to the Current Test for the Urdu-to-English track.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Participants

The following table lists the submissions received from the sites participating in the Urdu-to-English Current Test.

Site ID	Organization	Location	Current Test Set, Urdu-to-English
Site ID	Organization	Location	Single System Track	System Combination Track
afrl	Air Force Research Laboratory	USA	Yes	-
amsterdam	University of Amsterdam	Netherlands	Late and/or debugged submission	-
cmu-ebmt	Carnegie Mellon EBMT	USA	Yes⁽¹⁾	-
cmu-statxfer	Carnegie Mellon StatXfer	USA	Yes	Yes
hongkong	City University of Hong Kong	China	Yes	-
isi-lw	University of Southern California / Language Weaver Inc.	USA	Yes	Yes
jhu	Johns Hopkins University	USA	Late and/or debugged submission	-
systran	SYSTRAN Software Inc.	USA	Yes	-
umd	University of Maryland	USA	Yes	-
upc-lsi	UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)	Spain	Yes	-

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Current Test Results - Single System Track

Urdu-to-English (Table 1)

Site ID	System	BLEU-4 (mteval-v13a)
Site ID	System	Overall	Newswire	Web
Constrained Data Track
UnConstrained Data Track
isi-lw	isi-lw_u2e_cn_primary	0.3120	0.3753	0.2440
umd	UMD_u2e_cn_primary	0.2395	0.2697	0.2056
cmu-statxfer	CMU-Stat-Xfer_u2e_cn_primary	0.2322	0.2614	0.2013
afrl	AFRL_u2e_cn_primary	0.2235	0.2526	0.1904
upc-lsi	UPC.LSI_u2e_cn_primary	0.1971	0.2325	0.1610
cmu-ebmt	cmu_cunei_u2e_cn_primary⁽¹⁾	0.1863	0.2122	0.1568
hongkong	CityU_u2e_cn_primary	0.0254	0.0291	0.0215
systran	SYSTRAN_u2e_un_primary	0.2555	0.3034	0.2051

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Current Test Results - System Combination Track

Urdu-to-English (Table 2)

Site ID	System	BLEU-4 (mteval-v13a)
Site ID	System	Overall	Newswire	Web
Constrained Data Track
isi-lw	isi-lw_u2e_cn_combo1	0.3187	0.3816	0.2509
cmu-statxfer	CMU-Stat-Xfer_u2e_cn_combo1	0.2504	0.2910	0.2082

Current Test Results - Single System Track and System Combination Track - All metrics

Urdu-to-English (Table 3)

Site ID	System	BLEU-4 (mteval-v13a)			IBM BLEU (bleu-1.04)			NIST (mteval-v13a)			TER (tercom-0.7.25)			METEOR (meteor-0.7)
Site ID	System	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web
Constrained Data Track
UnConstrained Data Track
isi-lw	isi-lw_u2e_cn_combo1	0.3187	0.3816	0.2509	0.3187	0.3815	0.2509	8.948	9.465	7.540	0.5866	0.5495	0.6199	0.5616	0.6237	0.5027
isi-lw	isi-lw_u2e_cn_primary	0.3120	0.3753	0.2440	0.3119	0.3752	0.2440	8.836	9.379	7.413	0.5845	0.5495	0.6159	0.5539	0.6160	0.4951
cmu-statxfer	CMU-Stat-Xfer_u2e_cn_combo1	0.2504	0.2910	0.2082	0.2504	0.2907	0.2084	8.244	8.550	7.218	0.6273	0.6128	0.6404	0.5380	0.5838	0.4947
umd	UMD_u2e_cn_primary	0.2395	0.2697	0.2056	0.2394	0.2696	0.2056	7.871	8.089	7.051	0.6519	0.6436	0.6593	0.5355	0.5815	0.4919
cmu-statxfer	CMU-Stat-Xfer_u2e_cn_primary	0.2322	0.2614	0.2013	0.2323	0.2612	0.2017	7.743	7.919	7.029	0.6960	0.6926	0.6991	0.5445	0.5907	0.5006
afrl	AFRL_u2e_cn_primary	0.2235	0.2526	0.1904	0.2238	0.2526	0.1907	7.957	8.123	7.157	0.6537	0.6483	0.6584	0.5396	0.5815	0.4996
upc-lsi	UPC.LSI_u2e_cn_primary	0.1971	0.2325	0.1610	0.1972	0.2324	0.1612	7.510	7.945	6.361	0.6607	0.6526	0.6680	0.5015	0.5530	0.4524
cmu-ebmt	cmu_cunei_u2e_cn_primary⁽¹⁾	0.1863	0.2122	0.1568	0.1863	0.2121	0.1568	6.811	6.899	6.065	0.6977	0.7001	0.6956	0.5274	0.5801	0.4773
hongkong	CityU_u2e_cn_primary	0.0254	0.0291	0.0215	0.0256	0.0293	0.0218	2.647	2.788	2.430	1.1169	1.1565	1.0813	0.2929	0.3196	0.2676
systran	SYSTRAN_u2e_un_primary	0.2555	0.3034	0.2051	0.2553	0.3032	0.2050	8.203	8.622	7.090	0.6310	0.6060	0.6534	0.5401	0.5913	0.4921

⁽¹⁾A late and/or debugged system was also submitted, not reported here.

Progress Test Set Results

Introduction

This page of results is limited to those of the Progress Test data for the MT09 tests of Arabic-to-English and Chinese-to-English text-to-text translation.

The Progress test was designed as a means to demonstrate true system improvement for a particular site over time, irrespective of inherent data differences that come with each new test set. In order to keep the focus on progress over time, only results from sites that participated in both OpenMT08 and OpenMT09 are reported.

There may be issues related to processing older test sets that make across-site comparisons less meaningful.

Scores reported are limited to primary, on-time, non-debugged submissions.

Scores are ordered by BLEU-4 score on the Overall test set.

Note that all BLEU-4 scores reported on this page were computed using version 13a of the mteval script. Last year, scores were computed using mteval-v11b, which uses a different brevity penalty. This year's script (mteval-v13a) uses the 'closest reference translation length' (same as in IBM BLEU) instead of the 'shortest reference translation length', to compute the brevity penalty.

Participants

The following table lists the submissions received from the sites participating in the Progress Test. Cells in bold indicate submissions for which scores are reported:

Submissions were valid and on-time
The site also participated in MT08
The MT08 system was submitted for the same training condition as the OpenMT 2009 system.

Site ID	Organization	Location	Progress Test Set
Site ID	Organization	Location	Arabic-to-English	Chinese-to-English
amsterdam	University of Amsterdam	Netherlands	MT09 only	MT09 only
apptek	AppTek	USA	MT08 and MT09	Different training condition
bbn	BBN Technologies	USA	MT08 and MT09	MT08 and MT09
buaa	Beihang University, Institute of Intelligent Information Processing, School of Computer Science and Engineering	China	-	MT09 only
cas-ia	Chinese Academy of Sciences, Institute of Automation	China	-	MT09 only
cas-ict	Chinese Academy of Sciences, Institute of Computing Technology	China	-	MT09 only
ccid	China Center for Information Industry Development	China	-	MT09 only
cmu-smt	Carnegie Mellon LTI interACT	USA	Different training condition	Different training condition
cmu-statxfer	Carnegie Mellon StatXfer	USA	MT09 only	-
dcu	Dublin City University	Ireland	-	MT09 only
dfki	DFKI GmbH	Germany	-	MT09 only
edinburgh	University of Edinburgh	UK	MT09 only	-
fbk	Fondazione Bruno Kessler	Italy	MT09 only	-
frdc	Fujitsu Research & Development Center Co., Ltd.	China	-	MT09 only
hit-ltrc	Harbin Institute of Technology, Language Technology Research Center	China	-	MT09 only
isi-lw	University of Southern California / Language Weaver Inc.	USA	MT08 and MT09	MT08 and MT09
lium	Université du Maine (Le Mans)	France	-	MT09 only
lium-systran	Université du Maine (Le Mans) / SYSTRAN	.	Late and/or debugged submission	-
nju-nlp	Nanjing University NLP	China	-	MT09 only
nrc	National Research Council Canada	Canada	-	MT08 and MT09
nthu	National Tsing Hua University, Department of Computer Science	Taiwan	-	MT09 only
rwth	RWTH-Aachen University, Chair of Computer Sciences	Germany	MT09 only	MT09 only
sakhr	Sakhr Software	Egypt	MT09 only	-
sri	SRI International	USA	MT08 and MT09	MT08 and MT09
systran-lium	SYSTRAN / Université du Maine (Le Mans)	.	-	MT09 only
systran-nrc	SYSTRAN / National Research Council Canada	.	-	MT09 only
tubitak-uekae	TUBITAK-UEKAE	Turkey	MT09 only	-
umd	University of Maryland	USA	-	MT08 and MT09
upc-lsi	UPC-LSI (Universitat Politècnica de Catalunya, Llenguatges i Sistemes Informàtics)	Spain	MT08 and MT09	-

Progress Test Results - Single System Track

Arabic-to-English (Table 1)

Site ID	System	BLEU-4 (mteval-v13a)
		Overall		Newswire		Web
		MT08	MT09	MT08	MT09	MT08	MT09
Constrained Data Track
UnConstrained Data Track
bbn	BBN-progress_a2e_cn_primary	0.4186⁽¹⁾	0.4379	0.4655⁽¹⁾	0.4926	0.3566⁽¹⁾	0.3678
isi-lw	isi-lw_a2e_cn_primary	0.4030⁽¹⁾	0.4296	0.4498⁽¹⁾	0.4748	0.3408⁽¹⁾	0.3621
sri	SRI_a2e_cn_primary	0.4011⁽¹⁾	0.4087	0.4558⁽¹⁾	0.4499	0.3277⁽¹⁾	0.3551
upc-lsi	UPC.LSI_a2e_cn_primary	0.2956	0.3303	0.3448	0.3769	0.2300	0.2676
apptek	AppTek_a2e_un_primary	0.4195	0.3955	0.4245	0.3991	0.4131	0.3909

⁽¹⁾The MT08 system was a combined system

Chinese-to-English (Table 2)

Site ID	System	BLEU-4 (mteval-v13a)
		Overall		Newswire		Web
		MT08	MT09	MT08	MT09	MT08	MT09
Constrained Data Track
isi-lw	isi-lw_c2e_cn_primary	0.2990⁽¹⁾	0.3225	0.3516⁽¹⁾	0.3628	0.2237⁽¹⁾	0.2642
bbn	BBN-progress_c2e_cn_primary	0.3055⁽¹⁾	0.3153	0.3447⁽¹⁾	0.3481	0.2509⁽¹⁾	0.2697
nrc	NRC_c2e_cn_primary	0.2480	0.2811	0.2679	0.3130	0.2204	0.2357
sri	SRI_c2e_cn_primary	0.2617⁽¹⁾	0.2790	0.3028⁽¹⁾	0.3132	0.2032⁽¹⁾	0.2303
umd	UMD_c2e_cn_primary	0.2456	0.2500	0.2823	0.2781	0.1936	0.2108

⁽¹⁾The MT08 system was a combined system

Informal System Combination Results

Introduction

Informal System Combination was an informal, diagnostic MT09 task, offered after the official evaluation period. Output from several MT09 systems on the Arabic-to-English and Urdu-to-English Current tests was anonymized and provided for system combination purposes. Participants in this category produced new output based on those provided translations.

Scores reported here are limited to primary Informal System Combination submissions.

Disclaimer

History

2009/10/27 : First public release

Evaluation Data

System output for the Informal System Combination track included output of the Arabic-to-English and Urdu-to-English Current tests. Approximately 30% of the test data was designated as a development set for system combination. The remainder of the system output was provided as the test set.

Language Pair	Data Genre	Development Set	Evaluation Set
Arabic-to-English	Newswire	17 documents	42 documents
Arabic-to-English	Web	16 documents	40 documents
Urdu-to-English	Newswire	20 documents	48 documents
Urdu-to-English	Web	48 documents	114 documents

Informal System Combination Results

Arabic-to-English (Table 1)

Site ID	System	BLEU-4 (mteval-v13a)			IBM BLEU (bleu-1.04)			NIST (mteval-v13a)			TER (tercom-0.7.25)			METEOR (meteor-0.7)
Site ID	System	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web
Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set)
bbn	BBN_a2e_isc_primary	0.5747	0.6440	0.4940	0.5747	0.6440	0.4938	11.82	11.84	10.41	0.3761	0.3220	0.4298	0.7043	0.7601	0.6469
sri	SRI_a2e_isc_primary	0.5543	0.6292	0.4733	0.5542	0.6291	0.4732	11.68	11.79	10.26	0.3788	0.3244	0.4328	0.6989	0.7474	0.6493
cmu-statxfer	CMU-Stat-Xfer_a2e_isc_primary	0.5530	0.6332	0.4663	0.5529	0.6330	0.4662	11.62	11.80	10.15	0.3854	0.3279	0.4427	0.7033	0.7518	0.6538
rwth	RWTH_a2e_isc_primary	0.5515	0.6412	0.4523	0.5517	0.6411	0.4523	11.56	11.86	9.879	0.3923	0.3229	0.4613	0.6928	0.7568	0.6272
jhu	jhu_a2e_isc_primary	0.5483	0.6294	0.4577	0.5481	0.6291	0.4574	11.55	11.73	10.01	0.3862	0.3272	0.4448	0.6919	0.7494	0.6330
hit-ltrc	HIT-LTRC_a2e_isc_primary	0.5037	0.5997	0.3982	0.5038	0.6000	0.3981	10.65	11.48	8.406	0.4135	0.3472	0.4793	0.6596	0.7249	0.5922
tubitak-uekae	TUBITAK_a2e_isc_primary	0.4603	0.5371	0.3779	0.4603	0.5371	0.3779	10.31	10.75	8.726	0.4525	0.3942	0.5105	0.6263	0.6882	0.5625
system08_unconstrained.xml		0.5008	0.5719	0.4245	0.5007	0.5720	0.4243	11.04	11.28	9.598	0.4229	0.3641	0.4813	0.6694	0.7271	0.6104

Urdu-to-English (Table 2)

Site ID	System	BLEU-4 (mteval-v13a)			IBM BLEU (bleu-1.04)			NIST (mteval-v13a)			TER (tercom-0.7.25)			METEOR (meteor-0.7)
Site ID	System	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web	Overall	Newswire	Web
Highest individual system score in ISC test set (system with highest BLEU-4 score on Overall data set)
rwth	RWTH_u2e_isc_primary⁽¹⁾	0.3232	0.3768	0.2737	0.3235	0.3767	0.2740	8.822	9.274	7.425	0.5630	0.5383	0.5833	0.5539	0.6105	0.5046
jhu	jhu_u2e_isc_primary	0.3193	0.3796	0.2627	0.3191	0.3792	0.2627	8.736	9.197	7.418	0.5590	0.5317	0.5815	0.5512	0.6073	0.5022
cmu-statxfer	CMU-Stat-Xfer_u2e_isc_primary	0.3188	0.3821	0.2602	0.3188	0.3821	0.2602	8.694	9.154	7.353	0.5741	0.5422	0.6004	0.5560	0.6170	0.5030
hit-ltrc	HIT-LTRC_u2e_isc_primary	0.3103	0.3774	0.2453	0.3104	0.3773	0.2455	8.639	9.195	7.271	0.5820	0.5416	0.6152	0.5519	0.6184	0.4941
system09_constrained.xml		0.3104	0.3774	0.2456	0.3104	0.3773	0.2456	8.640	9.196	7.276	0.5816	0.5414	0.6146	0.5522	0.6186	0.4945

⁽¹⁾rescored

Information technology

Created August 27, 2024, Updated August 30, 2024

Was this page helpful?