Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

Topic-aware pivot language approach for statistical machine translation

Abstract: The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot-side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.

Key words: Natural language processing, Pivot-based statistical machine translation, Topical context information

Chinese Summary  <50> 主题敏感的枢轴语言统计机器翻译

研究目的:枢轴语言方法是解决统计机器翻译建模缺乏双语训练语言的一种方法。传统的枢轴语言方法忽视了枢轴语言文本存在的歧义性,导致建模得到的翻译模型概率知识不够准确。为此,本文使用主题模型为不同层次的上下文信息进行建模,并将上下文信息融入枢轴语言统计机器翻译的建模过程,以改善基于枢轴语言的统计机器翻译模型。
创新要点:使用传统的向量空间模型表示上下文,具有数据稀疏的缺点。本文采用主题模型将不同层次上下文信息概率化,使得枢轴语言文本的上下文信息能够较好融入翻译模型的概率计算,进而改善翻译模型。
研究方法:发挥主题模型的优势,使用主题模型对不同层次上下文进行降维表示;修改传统枢轴语言方法的建模公式,将上下文作为隐变量或相似度,重新调整翻译模型概率。
重要结论:数据实验表明,主题模型能够较好地表示不同层次的上下文,融入主题模型上下文的枢轴语言统计机器翻译模型比传统枢轴语言方法建立的模型具有更好效果。

关键词组:统计机器翻译;枢轴语言;主题模型


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.C1300208

CLC number:

TP391.1

Download Full Text:

Click Here

Downloaded:

3228

Download summary:

<Click Here> 

Downloaded:

1922

Clicked:

9180

Cited:

0

On-line Access:

2014-04-10

Received:

2013-08-04

Revision Accepted:

2013-11-07

Crosschecked:

2014-02-19

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE