Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2025 Vol.26 No.7 P.1066-1082

Shared-weight multimodal translation model for recognizing Chinese variant characters

Author(s): Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG
Affiliation(s): 1. School of Computer Science and Engineering, Southeast University, Nanjing 210000, China more
Corresponding email(s): syk@seu.edu.cn, libing@seu.edu.cn, lexiangli@seu.edu.cn, pengyang@seu.edu.cn
Key Words: Chinese variant characters, Multimodal model, Translation model, Phonology and morphology

Share this article to： More <<< Previous Article \|Next Article >>>

Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG. Shared-weight multimodal translation model for recognizing Chinese variant characters[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(7): 1066-1082.

@article{title="Shared-weight multimodal translation model for recognizing Chinese variant characters",
author="Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="7",
pages="1066-1082",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400504"
}

%0 Journal Article
%T Shared-weight multimodal translation model for recognizing Chinese variant characters
%A Yuankang SUN
%A Bing LI
%A Lexiang LI
%A Peng YANG
%A Dongmei YANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 7
%P 1066-1082
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400504

TY - JOUR
T1 - Shared-weight multimodal translation model for recognizing Chinese variant characters
A1 - Yuankang SUN
A1 - Bing LI
A1 - Lexiang LI
A1 - Peng YANG
A1 - Dongmei YANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 7
SP - 1066
EP - 1082
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400504

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The task of recognizing chinese variant characters aims to address the challenges of semantic ambiguity and confusion, which potentially cause risks to the security of Web content and complicate the governance of sensitive words. Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining, often overlooking the inherent phonological and morphological characteristics of the Chinese language. To address these issues, we propose a shared-weight multimodal translation model (SMTM) based on multimodal information of Chinese characters, which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant text. Specifically, we encode the Pinyin features of Chinese characters using the embedding layer, and the font features of Chinese characters are extracted based on convolutional neural networks directly. Considering the multimodal similarity between the source and target sentences of the Chinese variant-character-recognition task, we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process. The simulation results show that our proposed SMTM achieves remarkable performance of 89.550% and 79.480% on bilingual evaluation understudy (BLEU) and F1 metrics respectively, with significant improvement compared with state-of-the-art baseline models.

面向中文变体字识别的共享权重多模态翻译模型

孙元康^1,2，李冰^1,2，李乐翔^1,2，杨鹏^1,2，杨冬梅³
¹东南大学计算机科学与工程学院，中国南京市，210000
²东南大学计算机网络和信息集成教育部重点实验室，中国南京市，210000
³北京科技大学计算机与通信工程学院，中国北京市，100083
摘要：中文变体字识别任务旨在解决中文字符中存在的语义模糊和混淆问题，这些问题对网页内容的安全性构成潜在风险，并加剧敏感词汇管理的复杂性。大多数现有方法在预训练阶段侧重于从中文语料库和词汇中获取上下文语义，往往忽视了中文固有的音韵和形态特征。基于上述问题，本文提出一种面向中文变体字识别的共享权重多模态翻译模型。该模型将拼音的音韵特征和字体的形态特征整合到每个中文词元中，以学习变体文本的深层语义特征。具体来说，通过嵌入层对中文拼音音韵特征进行编码，并利用卷积神经网络学习中文字体形态特征。考虑到中文变体字识别任务中源句与目标句之间的多模态特征相似性，设计了共享权重嵌入机制，在训练过程中利用源句的启发式信息生成目标句。实验结果表明，本文所提出的共享权重多模态翻译模型在双语评估测试（BLEU）和F1值方面分别达到89.550%和79.480%，与当前最先进的基线模型相比有显著提升。

关键词：中文变体字；多模态模型；翻译模型；音韵和形态

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bao ZY, Li C, Wang R, 2020. Chunk-based Chinese spelling check with global optimization. Proc Findings of the Association for Computational Linguistics, p.2031-2040.

[2]Bryant C, Yuan Z, Qorib MR, et al., 2023. Grammatical error correction: a survey of the state of the art. Comput Linguist, 49(3):643-701.

[3]Chang Y, Kong L, Jia KJ, et al., 2021. Chinese named entity recognition method based on BERT. Proc IEEE Int Conf on Data Science and Computer Application, p.294-299.

[4]Chen KH, Wang R, Utiyama M, et al., 2018. Syntax-directed attention for neural machine translation. Proc 32^nd AAAI Conf on Artificial Intelligence, p.4792-4799.

[5]Cheng XY, Xu WD, Chen KL, et al., 2020. SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.871-881.

[6]Cho K, van Merriënboer B, Gulcehre C, et al., 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.1724-1734.

[7]Choi H, Cho K, Bengio Y, 2018. Fine-grained attention mechanism for neural machine translation. Neurocomputing, 284:171-176.

[8]Chollampatt S, Taghipour K, Ng HT, 2016. Neural network translation models for grammatical error correction. Proc 25^th Int Joint Conf on Artificial Intelligence, p.2768-2774.

[9]Cui YM, Che WX, Liu T, et al., 2021. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Process, 29:3504-3514.

[10]Dabre R, Chu CH, Kunchukuttan A, 2021. A survey of multilingual neural machine translation. ACM Comput Surv, 53(5):99.

[11]Dai F, Cai Z, 2017. Glyph-aware embedding of Chinese characters. Proc 1^st Workshop on Subword and Character Level Models in NLP, p.64-69.

[12]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional Transformers for language understanding. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.

[13]Diao SZ, Bai JX, Song Y, et al., 2020. ZEN: pre-training Chinese text encoder enhanced by N-gram representations. Proc Findings of the Association for Computational Linguistics, p.4729-4740.

[14]Dubey A, Jauhri A, Pandey A, et al., 2024. The Llama 3 herd of models. https://arxiv.org/abs/2407.21783

[15]Gehring J, Auli M, Grangier D, et al., 2017. Convolutional sequence to sequence learning. Proc 34^th Int Conf on Machine Learning, p.1243-1252.

[16]Hong YZ, Yu XG, He N, et al., 2019. FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. Proc 5^th Workshop on Noisy User-Generated Text, p.160-169.

[17]Hu H, Richardson K, Xu L, et al., 2020. OCNLI: original Chinese natural language inference. Proc Findings of the Association for Computational Linguistics, p.3512-3526.

[18]Ji JS, Wang QL, Toutanova K, et al., 2017. A nested attention neural hybrid model for grammatical error correction. Proc 55^th Annual Meeting of the Association for Computational Linguistics, p.753-762.

[19]Jia C, Shi YF, Yang QR, et al., 2020. Entity enhanced BERT pre-training for Chinese NER. Proc Conf on Empirical Methods in Natural Language Processing, p.6384-6396.

[20]Jia YZ, Xu XB, 2018. Chinese named entity recognition based on CNN-BiLSTM-CRF. Proc IEEE 9^th Int Conf on Software Engineering and Service Science, p.1-4.

[21]Jin H, Zhang ZB, Yuan PP, 2022. Improving Chinese word representation using four corners features. IEEE Trans Big Data, 8(4):982-993.

[22]Li B, Yang P, Zhao HL, et al., 2023. Hierarchical sliding inference generator for question-driven abstractive answer summarization. ACM Trans Inform Syst, 41(1):7.

[23]Li B, Yang P, Sun YK, et al., 2024. Advances and challenges in artificial intelligence text generation. Front Inform Technol Electron Eng, 25(1):64-83.

[24]Li JT, Meng K, 2021. MFE-NER: multi-feature fusion embedding for Chinese named entity recognition. https://arxiv.org/abs/2109.07877

[25]Li WG, Ramos RM, Brom PC, 2024. Threshold determination for Chinese character image processing in multimodal information fusion. Proc 28^th Int Conf on Asian Language Processing, p.43-48.

[26]Li WS, Wei YG, An D, et al., 2022. LSTM-TCN: dissolved oxygen prediction in aquaculture, based on combined model of long short-term memory network and temporal convolutional network. Environ Sci Pollut Res, 29(26):39545-39556.

[27]Li XN, Yan H, Qiu XP, et al., 2020. FLAT: Chinese NER using flat-lattice Transformer. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.6836-6842.

[28]Liang ZY, Du JP, Li CY, 2020. Abstractive social media text summarization using selective reinforced Seq2Seq attention model. Neurocomputing, 410:432-440.

[29]Liu J, Yang YH, Lv SQ, et al., 2019. Attention-based BiGRU-CNN for Chinese question classification. J Amb Intell Human Comput.

[30]Liu JG, Xia CH, Li XJ, et al., 2020. A BERT-based ensemble model for Chinese news topic prediction. Proc 2^nd Int Conf on Big Data Engineering, p.18-23.

[31]Liu SL, Yang T, Yue TC, et al., 2021. PLOME: pre-training with misspelled knowledge for Chinese spelling correction. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.2991-3000.

[32]Liu WJ, Zhou P, Wang ZR, et al., 2020. FastBERT: a self-distilling BERT with adaptive inference time. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.6035-6044.

[33]Liu Y, Lapata M, 2019. Hierarchical Transformers for multi-document summarization. Proc 57^th Conf of the Association for Computational Linguistics, p.5070-5081.

[34]Ma SM, Sun X, Lin JY, et al., 2018. Autoencoder as assistant supervisor: improving text representation for Chinese social media text summarization. Proc 56^th Annual Meeting of the Association for Computational Linguistics, p.725-731.

[35]Maruf S, Saleh F, Haffari G, 2022. A survey on document-level neural machine translation: methods and evaluation. ACM Comput Surv, 54(2):45.

[36]Meng FD, Zhang JC, 2019. DTMT: a novel deep transition architecture for neural machine translation. Proc 33^rd AAAI Conf on Artificial Intelligence, p.224-231.

[37]Meng FD, Lu ZD, Li H, et al., 2016. Interactive attention for neural machine translation. Proc 26^th Int Conf on Computational Linguistics, p.2174-2185.

[38]Meng YX, Wu W, Wang F, et al., 2019. Glyce: glyph-vectors for Chinese character representations. Proc 33^rd Int Conf on Neural Information Processing Systems, Article 247.

[39]Otter DW, Medina JR, Kalita JK, 2021. A survey of the usages of deep learning for natural language processing. IEEE Trans Neur Netw Learn Syst, 32(2):604-624.

[40]Papineni K, Roukos S, Ward T, et al., 2002. BLUE: a method for automatic evaluation of machine translation. Proc 40^th Annual Meeting of the Association for Computational Linguistics, p.311-318.

[41]Reimers N, Gurevych I, 2019. Sentence-BERT: sentence embeddings using siamese BERT-networks. Proc Conf on Empirical Methods in Natural Language Processing and the 9^th Int Joint Conf on Natural Language Processing, p.3980-3990.

[42]Shao YF, Geng ZC, Liu YT, et al., 2024. CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation. Sci China Inform Sci, 67(5): 152102.

[43]Sheng L, Xu ZX, Li XL, et al., 2023. EDMSpell: incorporating the error discriminator mechanism into Chinese spelling correction for the overcorrection problem. J King Saud Univ-Comput Inform Sci, 35(6): 101573.

[44]Soydaner D, 2022. Attention mechanism in neural networks: where it comes and where it goes. Neur Comput Appl, 34(16):13371-13385.

[45]Stahlberg F, 2020. Neural machine translation: a review. J Artif Intell Res, 69:343-418.

[46]Sun Y, Wang SH, Li YK, et al., 2019. ERNIE: enhanced representation through knowledge integration. https://arxiv.org/abs/1904.09223

[47]Sun ZJ, Li XY, Sun XF, et al., 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.2065-2075.

[48]Tao HQ, Tong SW, Zhao HK, et al., 2019. A radical-aware attention-based model for Chinese text classification. Proc 33^rd AAAI Conf on Artificial Intelligence, p.5125-5132.

[49]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31^st Int Conf on Neural Information Processing Systems, p.6000-6010.

[50]Wang DM, Song Y, Li J, et al., 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. Proc Conf on Empirical Methods in Natural Language Processing, p.2517-2527.

[51]Wang DM, Tay Y, Zhong L, 2019. Confusionset-guided pointer networks for Chinese spelling check. Proc 57^th Annual Meeting of the Association for Computational Linguistics, p.5780-5785.

[52]Wang YG, Cheng SB, Jiang LY, et al., 2017. Sogou neural machine translation systems for WMT17. Proc 2^nd Conf on Machine Translation, p.410-415.

[53]Weng RX, Yu H, Huang SJ, et al., 2020. Acquiring knowledge from pre-trained model to neural machine translation. Proc 34^th AAAI Conf on Artificial Intelligence, p.9266-9273.

[54]Wu FZ, Liu JX, Wu CH, et al., 2019. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation. Proc World Wide Web Conf, p.3342-3348.

[55]Xie JB, Hou YJ, Wang YJ, et al., 2020. Chinese text classification based on attention mechanism and feature-enhanced fusion neural network. Computing, 102(3):683-700.

[56]Xu HD, Li ZL, Zhou QY, et al., 2021. Read, listen, and see: leveraging multimodal information helps Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.716-728.

[57]Yan H, Deng BC, Li XN, et al., 2019. TENER: adapting Transformer encoder for named entity recognition. https://arxiv.org/abs/1911.04474

[58]Yang A, Yang BS, Hui BY, et al., 2024. Qwen2 technical report. https://arxiv.org/abs/2407.10671

[59]Yao YS, Huang Z, 2016. Bi-directional LSTM recurrent neural network for Chinese word segmentation. Proc 23^rd Int Conf on Neural Information Processing, p.345-353.

[60]Zhang B, Xiong DY, Xie J, et al., 2020. Neural machine translation with GRU-gated attention model. IEEE Trans Neur Netw Learn Syst, 31(11):4688-4698.

[61]Zhang SH, Huang HR, Liu JC, et al., 2020. Spelling error correction with soft-masked BERT. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.882-890.

[62]Zhang Y, Liu YG, Zhu JJ, et al., 2019. Learning Chinese word embeddings from stroke, structure and pinyin of characters. Proc 28^th ACM Int Conf on Information and Knowledge Management, p.1011-1020.

[63]Zhang YS, Zheng J, Jiang YR, et al., 2019. A text sentiment classification modeling method based on coordinated CNN-LSTM-attention model. Chin J Electron, 28(1):120-126.

[64]Zhao H, Cai D, Xin Y, et al., 2017. A hybrid model for Chinese spelling check. ACM Trans Asian Low-Resour Lang Inform Process, 16(3):21.

[65]Zhao S, Hu MH, Cai ZP, et al., 2023. Enhancing Chinese character representation with lattice-aligned attention. IEEE Trans Neur Netw Learn Syst, 34(7):3727-3736.

[66]Zhou J, Cui GQ, Hu SD, et al., 2020. Graph neural networks: a review of methods and applications. AI Open, 1:57-81.

[67]Zhou SY, Xu S, Xu B, 2018. Multilingual end-to-end speech recognition with a single Transformer on low-resource languages. https://arxiv.org/abs/1806.05059v2

[68]Zhuang H, Wang C, Li CL, et al., 2017. Natural language processing service based on stroke-level convolutional networks for Chinese text classification. Proc IEEE Int Conf on Web Services, p.404-411.

Open peer comments: Debate/Discuss/Question/Opinion

<1>