|
Journal of Zhejiang University SCIENCE C
ISSN 1869-1951(Print), 1869-196x(Online), Monthly
2014 Vol.15 No.4 P.241-253
Topic-aware pivot language approach for statistical machine translation
Abstract: The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot-side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.
Key words: Natural language processing, Pivot-based statistical machine translation, Topical context information
创新要点:使用传统的向量空间模型表示上下文,具有数据稀疏的缺点。本文采用主题模型将不同层次上下文信息概率化,使得枢轴语言文本的上下文信息能够较好融入翻译模型的概率计算,进而改善翻译模型。
研究方法:发挥主题模型的优势,使用主题模型对不同层次上下文进行降维表示;修改传统枢轴语言方法的建模公式,将上下文作为隐变量或相似度,重新调整翻译模型概率。
重要结论:数据实验表明,主题模型能够较好地表示不同层次的上下文,融入主题模型上下文的枢轴语言统计机器翻译模型比传统枢轴语言方法建立的模型具有更好效果。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/jzus.C1300208
CLC number:
TP391.1
Download Full Text:
Downloaded:
3603
Download summary:
<Click Here>Downloaded:
2131Clicked:
10249
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2014-02-19