
CLC number: TP391.1
On-line Access: 2026-01-08
Received: 2025-05-08
Revision Accepted: 2025-09-04
Crosschecked: 2026-01-08
Cited: 0
Clicked: 1367
Citations: Bibtex RefMan EndNote GB/T7714
Li WEIGANG, Pedro Carvalho BROM. Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500298 @article{title="Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation", %0 Journal Article TY - JOUR
反向翻译中的诗意悖论:大语言模型的中文翻译质量评估1巴西利亚大学计算机系,巴西巴西利亚,70919-900 2巴西利亚联邦工技学院数学系,巴西巴西利亚,71200-020 摘要:大语言模型(LLMs)在多语言翻译任务中成效卓著,但在处理内涵深蕴、语义复杂的中文时面临挑战。本文提出基于大语言模型的反向翻译(LLM-BT)框架,基于"中文→中间语言→中文"翻译流程,评价翻译质量。研究涵盖5个主流LLM与3种传统翻译工具,构建了多样化语料库,包括科学摘要、历史悖论和文学隐喻,以反映中文在词汇与语义层面的复杂性。构建了NLPMetrics评价体系,涉及双语评估分数(BLEU)、字符F1测度(CHRF)、翻译编辑率(TER)及语义相似度(SS)指标。实验结果表明,LLM在文学类任务中普遍优于传统工具。同时也揭示一种高维行为现象--诗意悖论,即模型往往能保持翻译表面流畅,却削弱了隐喻与情感深度。此外,部分模型表现出逐字回译倾向,在重复或跨模型测试下呈现出数据驱动的"准自我意识"。为改善BLEU在汉语评估中的局限性,,本文提出融合结巴分词与词频加权的改进型BLEU,有效提升了对词汇切分与术语一致性的敏感度。补充实验显示,在部分语义维度上,LLM输出已接近人工诗歌翻译的忠实度,但仍缺乏深层次的隐喻表达。本文将传统的"忠实度-流畅度"评价拓展为多维度的LLM行为分析,提供了一个促进可解释人工智能发展的透明框架,并为文化自然语言处理及多语言LLM对齐等领域指出新的研究路径。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Aiken M, Park M, 2010. The efficacy of round-trip translation for MT evaluation. Transl J, 14(1):1-10. ![]() [2]Arruda-Vasconcelos R, Louzada LM, Feres M, et al., 2021. Investigation of microbial profile, levels of endotoxin and lipoteichoic acid in teeth with symptomatic irreversible pulpitis: a clinical study. Int Endod J, 54(1):46-64. ![]() [3]Artetxe M, Labaka G, Agirre E, 2018. Unsupervised statistical machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.3632-3642. ![]() [4]Bahji A, Acion L, Laslett AM, et al., 2023. Exclusion of the non-English-speaking world from the scientific literature: recommendations for change for addiction journals and publishers. Nord Stud Alcohol Drugs, 40(1):6-13. ![]() [5]Baker M, 2018. In Other Words: a Coursebook on Translation (3rd Ed.). Routledge, London, UK. ![]() [6]Berman A, Venuti L, 2021. Translation and the Trials of the Foreign. In: Venuti L (Ed.), The Translation Studies Reader (4th Ed.). Routledge, London, UK, p.247-260. ![]() [7]Brimacombe B, Zhou JW, 2023. Quick back-translation for unsupervised machine translation. Proc Findings of the Association for Computational Linguistics, p.8521-8534. ![]() [8]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. ![]() [9]Cao Z, Lu J, Cui S, et al., 2020. Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Patt Recogn, 107:107488. ![]() [10]Chan SW, 2004. A Dictionary of Translation Technology. The Chinese University of Hong Kong Press, Hong Kong, China (in Chinese). ![]() [11]Chen AD, Lou LZ, Chen KH, et al., 2024a. Benchmarking LLMs for translating classical Chinese poetry: evaluating adequacy, fluency, and elegance. https://arxiv.org/abs/2408.09945 ![]() [12]Chen AD, Lou LZ, Chen KH, et al., 2024b. DUAL-REFLECT: enhancing large language models for reflective translation through dual learning feedback mechanisms. Proc 62nd Annual Meeting of the Association for Computational Linguistics, p.693-704. ![]() [13]Chung JB, Kim T, 2025. Leveraging large language models for enhanced back-translation: techniques and applications. IEEE Access, 13:61322-61328. ![]() [14]Degroot AMB, Dannenburg L, Vanhell JG, 1994. Forward and backward word translation by bilinguals. J Mem Lang, 33(5):600-629. ![]() [15]Demšar J, 2006. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 7:1-30. ![]() [16]Ding Y, Teng F, Zhang P, et al., 2021. Research on text information mining technology of substation inspection based on improved Jieba. Proc Int Conf on Wireless Communications and Smart Grid, p.561-564. ![]() [17]Eberhard DM, Simons GF, Fennig CD, 2022. Ethnologue: Languages of the World (25th Ed.). SIL International, Dallas, USA. ![]() [18]Edunov S, Ott M, Auli M, et al., 2018. Understanding back-translation at scale. Proc Conf on Empirical Methods in Natural Language Processing, p.489-500. ![]() [19]Feng SB, 2024. Discussion on applied chemical metrology calculation based on computer technology—comment on “applied chemistry”. Chin J Appl Chem, 41(12):1829-1830 (in Chinese). ![]() [20]Gain B, Bandyopadhyay D, Ekbal A, 2025. Bridging the linguistic divide: a survey on leveraging large language models for machine translation. https://arxiv.org/abs/2504.01919 ![]() [21]Glidden-Tracey C, Greenwood AK, 1997. A validation study of the Spanish self-directed search using back-translation procedures. J Career Assess, 5(1):105-113. ![]() [22]He J, 2019. The Chinese nomenclature for the heterocyclic compounds since 1932. Chemistry, 82(4):373-378 (in Chinese). ![]() [23]He YJ, Hou LP, Lang LY, 2023. L2 Acquisition from Perspectives of Professional Translation and Interpreting. In: Maqbool T, Lang LY, Meltzoff K (Eds.), Second Language Acquisition—Learning Theories and Recent Approaches. IntechOpen, p.85. ![]() [24]Hoang VCD, Koehn P, Haffari G, et al., 2018. Iterative back-translation for neural machine translation. Proc 2nd Workshop on Neural Machine Translation and Generation, p.18-24. ![]() [25]Jiang JY, Liu C, 2020. Comparison and analysis of research and development expenditure and publication output of major countries (regions) in the world. Bull Natl Nat Sci Found China, 34(3):367-372 (in Chinese). ![]() [26]Klaudy K, 1996. Back-translation as a tool for detecting explicitation strategies in translation. In: Klaudy K, Lambert J, Sohár A (Eds.), Translation Studies in Hungary. Scholastica, Budapest, Hungary, p.99-114. ![]() [27]Kroll JF, Stewart E, 1994. Category interference in translation and picture naming: evidence for asymmetric connections between bilingual memory representations. J Mem Lang, 33(2):149-174. ![]() [28]La Heij W, Hooglander A, Kerling R, et al., 1996. Nonverbal context effects in forward and backward word translation: evidence for concept mediation. J Mem Lang, 35(5):648-665. ![]() [29]Li HZ, Sha J, Shi C, 2020. Revisiting back-translation for low-resource machine translation between Chinese and Vietnamese. IEEE Access, 8:119931-119939. ![]() [30]Li YH, Huang HY, Wang BJ, et al., 2025. DRMSpell: dynamically reweighting multimodality for Chinese spelling correction. Front Inform Technol Electron Eng, 26(3):354-366. ![]() [31]Ling L, Lin CH, Lin TY, et al., 2025. Scenethesis: a language and vision agentic framework for 3D scene generation. https://arxiv.org/abs/2505.02836 ![]() [32]Liu Y, Liang NY, 1986. Hanyu chuli de jichu gongcheng—xiandai hanyu cifrequency tongji. J Chin Inform Process, 1(1):17-25 (in Chinese). ![]() [33]Luo RX, Xu JJ, Zhang Y, et al., 2019. PKUSEG: a toolkit for multi-domain Chinese word segmentation. https://arxiv.org/abs/1906.11455 ![]() [34]Ma WW, 2024. Effect of amino oligosaccharides combined with chemical fungicides on the control of downy mildew in Chinese cabbage. Contemp Farm Mach, (12):68-69 (in Chinese). ![]() [35]Marivate V, Sefara T, 2020. Improving short text classification through global augmentation methods. Proc CICLing, p.234-246. ![]() [36]Modarressi A, Köksal A, Imani A, et al., 2024. MemLLM: finetuning LLMs to use an explicit read-write memory. https://arxiv.org/abs/2404.11672 ![]() [37]Nam GE, Park YG, 2015. Re: Inhibition of peripheral FAAH depresses activities of bladder mechanosensitive nerve fibers of the rat. J Urol, 193(2):738-739. ![]() [38]Nida EA, 1964. Toward a Science of Translating: with Special Reference to Principles and Procedures Involved in Bible Translating. Brill Archive, Leiden, the Netherlands. ![]() [39]Ozolins U, Hale S, Cheng X, et al., 2020. Translation and back-translation methodology in health research—a critique. Expert Rev Pharmacoecon Outcomes Res, 20(1):69-77. ![]() [40]Papineni K, Roukos S, Ward T, et al., 2002. BLEU: a method for automatic evaluation of machine translation. Proc 40th Annual Meeting of the Association for Computational Linguistics, p.311-318. ![]() [41]Qiang JP, Li Y, Zhang CW, et al., 2023. Chinese idiom paraphrasing. Trans Assoc Comput Ling, 11:740-754. ![]() [42]Salamoura A, Williams JN, 1999. Backward word translation: lexical vs. conceptual mediation or “concept activation vs. word retrieval”? RCEAL Work Pap Engl Appl Ling, 6:31-56. ![]() [43]Schäffner C, 2004. Metaphor and translation: some implications of a cognitive approach. J Pragmat, 36(7):1253-1269. ![]() [44]Sennrich R, Haddow B, Birch A, 2016. Improving neural machine translation models with monolingual data. Proc 54th Annual Meeting of the Association for Computational Linguistics, p.86-96. ![]() [45]Shan LL, Luo SX, Zhu ZZ, et al., 2025. Cognitive memory in large language models. https://arxiv.org/abs/2504.02441 ![]() [46]Sheldon MR, Fillyaw MJ, Thompson WD, 1996. The use and interpretation of the Friedman test in the analysis of ordinal-scale data in repeated measures designs. Physioth Res Int, 1(4):221-228. ![]() [47]Somers H, 2005. Round-trip translation: what is it good for? Proc Australasian Language Technology Workshop, p.127-133. ![]() [48]Sun ZJ, Li XY, Sun XF, et al., 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.2065-2075. ![]() [49]Taheri A, Zamanifar A, Farhadi A, 2025. Enhancing aspect-based sentiment analysis using data augmentation based on back-translation. Int J Data Sci Anal, 19(3):491-516. ![]() [50]Tao Z, Che YF, Xi DH, et al., 2024. Towards reliable detection of LLM-generated texts: a comprehensive evaluation framework with CUDRT. https://arxiv.org/abs/2406.09056 ![]() [51]Toral A, Way A, 2018. What level of quality can neural machine translation attain on literary text? In: Moorkens J, Castilho S, Gaspari F, et al. (Eds.), Translation Quality Assessment: from Principles to Practice. Springer, Cham, p.263-287. ![]() [52]Troiano E, Klinger R, Padó S, 2020. Lost in back-translation: emotion preservation in neural machine translation. Proc 28th Int Conf on Computational Linguistics, p.4340-4354. ![]() [53]Tu QY, Li CB, 2017. A review on textless back translation of China-themed works written in English. Stud Lit Lang, 14(1):1-7. ![]() [54]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010. ![]() [55]Waijanya S, Mingkhwan A, 2014. Thai poetry translation to English with backward translation evaluation. Proc 9th Int Conf on Digital Information Management, p.248-253. ![]() [56]Wang HY, 2009. Introduction to Literary Translation Criticism. China Renmin University Press, Beijing, China (in Chinese). ![]() [57]Wei JQ, Ren XZ, Li XG, et al., 2019. NEZHA: neural contextualized representation for Chinese language understanding. https://arxiv.org/abs/1909.00204 ![]() [58]Weigang L, Brom PC, 2025. LLM-BT-terms: back-translation as a framework for terminology standardization and dynamic semantic embedding. https://arxiv.org/abs/2506.08174 ![]() [59]Weigang L, Marinho MC, Li DL, et al., 2024. Six-writings multimodal processing with pictophonetic coding to enhance Chinese language models. Front Inform Technol Electron Eng, 25(1):84-105. ![]() [60]Weigang L, Brom PC, Ramos RM, 2025a. Quantitative evaluation of translation quality and computational efficiency in semantic vs. phonetic strategies for Chinese scientific terms. Proc 29th Int Conf on Asian Language Processing, p.43-48. ![]() [61]Weigang L, Ramos RM, Brom PC, et al., 2025b. Threshold study for Hanzi image recognition: defining character and component limits in Chinese, Japanese, and Korean script processing. Int J Asian Lang Process, 35(1):2450011. ![]() [62]Wong KF, Li WJ, Xu RF, et al., 2010. Introduction to Chinese Natural Language Processing. Springer, Cham, Germany. ![]() [63]Wu MM, Hu YX, Zhang YC, et al., 2024. Mitigating idiom inconsistency: a multi-semantic contrastive learning method for Chinese idiom reading comprehension. Proc 38th AAAI Conf on Artificial Intelligence, p.19243-19251. ![]() [64]Yang HK, Lin ZH, Wang WJ, et al., 2024. Memory3: language modeling with explicit memory. https://arxiv.org/abs/2407.01178 ![]() [65]Yang YX, Ren GC, 2020. HanLP-based technology function matrix construction on Chinese process patents. Int J Mob Comput Multim Commun, 11(3):48-64. ![]() [66]Yousufi S, Erdely F, 2024. Enhancing nonparametric tests: insights for computational intelligence and data mining. Res Acad Innov Data Anal, 1(3):214-226. ![]() [67]Yung C, Dolatabadi HM, Erfani S, et al., 2025. Round trip translation defence against large language model jailbreaking attacks. Proc Workshops, ADUR, FairPC, GLFM, PM4B and RAFDA Trends and Applications in Knowledge Discovery and Data Mining, p.286-297. ![]() [68]Zhang XE, 2021. A study of cultural context in Chinese–English translation. Reg-Educ Res Rev, 3(2):11-14. ![]() [69]Zhang Y, Shuai YH, Xiao CY, et al., 2025. The structure of the bilingual lexicon: evidence from a semantic blocked word translation task with Chinese–English bilinguals. Second Lang Res, early access. ![]() [70]Zhang ZY, Bo XH, Ma C, et al., 2024. A survey on the memory mechanism of large language model based agents. https://arxiv.org/abs/2404.13501 ![]() [71]Zhao SQ, Zhou YH, Ren YP, et al., 2025. Fùxì: a benchmark for evaluating language models on ancient Chinese. https://arxiv.org/abs/2503.15837 ![]() [72]Zhong CZ, Cheng F, Liu QY, et al., 2024. Beyond English-centric LLMs: what language do multilingual language models think in? https://arxiv.org/abs/2408.10811 ![]() [73]Zhou Z, 2014. The six principles of Chinese writing and its application to design as design idea. Stud Lit Lang, 8(3):84-88. ![]() [74]Zhu SL, Pan LY, Jian D, et al., 2025. Overcoming language barriers via machine translation with sparse mixture-of-experts fusion of large language models. Inform Process Manag, 62(3):104078. ![]() [75]Zhuo TY, Xu QK, He XL, et al., 2023. Rethinking round-trip translation for machine translation evaluation. Proc Findings of the Association for Computational Linguistics, p.319-337. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>