Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation

Abstract: Large language models (LLMs) excel in multilingual translation tasks, yet often struggle with culturally and semantically rich Chinese texts. This study introduces the framework of back-translation (BT) powered by LLMs, or LLM-BT, to evaluate Chinese → intermediate language → Chinese translation quality across five LLMs and three traditional systems. We construct a diverse corpus containing scientific abstracts, historical paradoxes, and literary metaphors, reflecting the complexity of Chinese at the lexical and semantic levels. Using our modular NLPMetrics system, including bilingual evaluation understudy (BLEU), character F-score (CHRF), translation edit rate (TER), and semantic similarity (SS), we find that LLMs outperform traditional tools in cultural and literary tasks. However, the results of this study uncover a high-dimensional behavioral phenomenon, the paradox of poetic intent, where surface fluency is preserved, but metaphorical or emotional depth is lost. Additionally, some models exhibit verbatim BT, suggesting a form of data-driven quasi-self-awareness, particularly under repeated or cross-model evaluation. To address BLEU’s limitations for Chinese, we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting, improving sensitivity to lexical segmentation and term consistency. Supplementary tests show that in certain semantic dimensions, LLM outputs approach the fidelity of human poetic translations, despite lacking a deeper metaphorical intent. Overall, this study reframes traditional fidelity vs. fluency evaluation into a richer, multi-layered analysis of LLM behavior, offering a transparent framework that contributes to explainable artificial intelligence and identifies new research pathways in cultural natural language processing and multilingual LLM alignment.

Key words: Back-translation; Chinese natural language processing; Large language model-based back-translation (LLM-BT); Paradox of poetic intent; Quasi-self-awareness; Verbatim back-translation

Chinese Summary  <1> 反向翻译中的诗意悖论:大语言模型的中文翻译质量评估

Li WEIGANG1(李伟钢),Pedro Carvalho BROM2
1巴西利亚大学计算机系,巴西巴西利亚,70919-900
2巴西利亚联邦工技学院数学系,巴西巴西利亚,71200-020
摘要:大语言模型(LLMs)在多语言翻译任务中成效卓著,但在处理内涵深蕴、语义复杂的中文时面临挑战。本文提出基于大语言模型的反向翻译(LLM-BT)框架,基于"中文→中间语言→中文"翻译流程,评价翻译质量。研究涵盖5个主流LLM与3种传统翻译工具,构建了多样化语料库,包括科学摘要、历史悖论和文学隐喻,以反映中文在词汇与语义层面的复杂性。构建了NLPMetrics评价体系,涉及双语评估分数(BLEU)、字符F1测度(CHRF)、翻译编辑率(TER)及语义相似度(SS)指标。实验结果表明,LLM在文学类任务中普遍优于传统工具。同时也揭示一种高维行为现象--诗意悖论,即模型往往能保持翻译表面流畅,却削弱了隐喻与情感深度。此外,部分模型表现出逐字回译倾向,在重复或跨模型测试下呈现出数据驱动的"准自我意识"。为改善BLEU在汉语评估中的局限性,,本文提出融合结巴分词与词频加权的改进型BLEU,有效提升了对词汇切分与术语一致性的敏感度。补充实验显示,在部分语义维度上,LLM输出已接近人工诗歌翻译的忠实度,但仍缺乏深层次的隐喻表达。本文将传统的"忠实度-流畅度"评价拓展为多维度的LLM行为分析,提供了一个促进可解释人工智能发展的透明框架,并为文化自然语言处理及多语言LLM对齐等领域指出新的研究路径。

关键词组:反向翻译;中文自然语言处理;基于大语言模型的反向翻译(LLM-BT);诗意悖论;准自我意识;逐字回译


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2500298

CLC number:

TP391.1

Download Full Text:

Click Here

Downloaded:

692

Download summary:

<Click Here> 

Downloaded:

165

Clicked:

1271

Cited:

0

On-line Access:

2026-01-08

Received:

2025-05-08

Revision Accepted:

2025-09-04

Crosschecked:

2026-01-08

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE