
CLC number:
On-line Access: 2025-10-20
Received: 2025-05-08
Revision Accepted: 2025-09-04
Crosschecked: 0000-00-00
Cited: 0
Clicked: 572
Li WEIGANG1, Pedro Carvalho BROM2. Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation",
author="Li WEIGANG1, Pedro Carvalho BROM2",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500298"
}
%0 Journal Article
%T Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation
%A Li WEIGANG1
%A Pedro Carvalho BROM2
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500298
TY - JOUR
T1 - Paradox of poetic intent in back-translation: evaluating the quality of large language models in chinese translation
A1 - Li WEIGANG1
A1 - Pedro Carvalho BROM2
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500298
Abstract: Large language models (LLMs) excel in multilingual translation tasks, yet often struggle with culturally and semantically rich Chinese texts. This study introduces the LLM-BT framework, back-translation (BT) powered by LLMs, to evaluate Chinese → intermediate language → Chinese translation quality across five LLMs and three traditional systems. We construct a diverse corpus containing scientific abstracts, historical paradoxes and literary metaphors, reflecting the complexity of Chinese at the lexical and semantic levels. Using our modular NLPMetrics system (including bilingual evaluation understudy [BLEU], character F?score [CHRF], translation edit rate [TER], and semantic similarity [SS]), we find that LLMs outperform traditional tools in cultural and literary tasks. However, the results of this study also uncover a high-dimensional behavioral phenomenon, the paradox of poetic intent, where surface fluency is preserved, but metaphorical or emotional depth is lost. Additionally, some models exhibit verbatim back-translation, suggesting a form of data-driven quasi-self-awareness, particularly under repeated or cross-model evaluation. To address BLEU's limitations for Chinese, we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting, improving sensitivity to lexical segmentation and term consistency. Supplementary tests show that in certain semantic dimensions, LLM outputs approach the fidelity of human poetic translations, despite lacking a deeper metaphorical intent. Overall, this study reframes traditional fidelity vs. fluency evaluation into a richer, multi-layered analysis of LLM behavior, offering a transparent framework that contributes to Explainable AI (XAI) and identifies new research pathways in cultural natural language processing (NLP) and multilingual LLM alignment.
Open peer comments: Debate/Discuss/Question/Opinion
<1>