JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2025 Vol.26 No.3 P.354-366

DRMSpell: dynamically reweighting multimodality for Chinese spelling correction

Author(s): Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO
Affiliation(s): School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; more
Corresponding email(s): yhli@bit.edu.cn, hhy63@bit.edu.cn, puking.w@huawei.com, gyang@bit.edu.cn
Key Words: Chinese spelling correction, Multimodality, Masking strategy

Share this article to： More <<< Previous Article \|Next Article >>>

Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO. DRMSpell: dynamically reweighting multimodality for Chinese spelling correction[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(3): 354-366.

@article{title="DRMSpell: dynamically reweighting multimodality for Chinese spelling correction",
author="Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="3",
pages="354-366",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300816"
}

%0 Journal Article
%T DRMSpell: dynamically reweighting multimodality for Chinese spelling correction
%A Yinghao LI
%A Heyan HUANG
%A Baojun WANG
%A Yang GAO
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 3
%P 354-366
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300816

TY - JOUR
T1 - DRMSpell: dynamically reweighting multimodality for Chinese spelling correction
A1 - Yinghao LI
A1 - Heyan HUANG
A1 - Baojun WANG
A1 - Yang GAO
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 3
SP - 354
EP - 366
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300816

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: chinese spelling correction (CSC) is a task that aims to detect and correct the spelling errors that may occur in Chinese texts. However, the Chinese language exhibits a high degree of complexity, characterized by the presence of multiple phonetic representations known as pinyin, which possess distinct tonal variations that can correspond to various characters. Given the complexity inherent in the Chinese language, the CSC task becomes imperative for ensuring the accuracy and clarity of written communication. Recent research has included external knowledge into the model using phonological and visual modalities. However, these methods do not effectively target the utilization of modality information to address the different types of errors. In this paper, we propose a multimodal pretrained language model called DRMSpell for CSC, which takes into consideration the interaction between the modalities. A dynamically reweighting multimodality (DRM) module is introduced to reweight various modalities for obtaining more multimodal information. To fully use the multimodal information obtained and to further strengthen the model, an independent-modality masking strategy (IMS) is proposed to independently mask three modalities of a token in the pretraining stage. Our method achieves state-of-the-art performance on most metrics constituting widely used benchmarks. The findings of the experiments demonstrate that our method is capable of modeling the interactive information between modalities and is also robust to incorrect modal information.

DRMSpell：中文拼写纠正中的动态多模态重新加权技术

李英豪¹，黄河燕^1,2，王宝军³，高扬^1,2
¹北京理工大学计算机学院，中国北京市，100081
²北京理工大学东南信息技术研究院，中国莆田市，351100
³华为诺亚方舟实验室，中国深圳市，518129
摘要：中文拼写纠正任务旨在检测和纠正中文文本中可能出现的拼写错误。但中文表现出高度的复杂性，其特点是存在多种声调变化的拼音表示，这些声调变化可以对应不同的字符。鉴于中文语言的这种复杂性，中文拼写纠正任务对于确保书面交流的准确性和清晰度至关重要，最近的研究已经将外部知识通过语音和视觉模态引入模型中。然而，这些方法未能有效地利用模态信息来针对性地解决不同类型的拼写错误。在本文中我们提出一个名为DRMSpell的多模态预训练语言模型以用于中文拼写纠正，该模型考虑了模态之间的交互作用。我们引入一个动态多模态重新加权模块，用于重新加权各种模态以获取更多的多模态信息。为充分利用所获得的多模态信息并进一步加强模型，我们提出一个独立模态掩码策略，在预训练阶段独立掩蔽一个词元的三种模态。我们的方法在大多数广泛使用的基准测试指标上实现了最先进的性能，实验结果表明，我们的方法能够建模模态之间的交互信息，即使对错误模态信息也具有鲁棒性。

关键词：中文拼写纠正；多模态；掩码策略

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Bahdanau D, Cho K, Bengio Y, 2015. Neural machine translation by jointly learning to align and translate. Proc 3^rd Int Conf on Learning Representations.

[2]Bhardwaj V, Ben Othman MT, Kukreja V, et al., 2022. Automatic speech recognition (ASR) systems for children: a systematic literature review. Appl Sci, 12(9):4419.

[3]Cheng XY, Xu WD, Chen KL, et al., 2020. SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.871-881.

[4]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional Transformers for language understanding. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.

[5]Guo Z, Ni Y, Wang KQ, et al., 2021. Global attention decoder for Chinese spelling error correction. Proc Findings of the Association for Computational Linguistics, p.1419-1428.

[6]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.

[7]Hong YZ, Yu XG, He N, et al., 2019. FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. Proc 5^th Workshop on Noisy User-Generated Text, p.160-169.

[8]Huang L, Li JJ, Jiang WW, et al., 2021. PHMOSpell: phonological and morphological knowledge guided Chinese spelling check. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.5958-5967.

[9]Jin H, Zhang ZB, Yuan PP, 2022. Improving Chinese word representation using four corners features. IEEE Trans Big Data, 8(4):982-993.

[10]Kim G, Hong T, Yim M, et al., 2022. OCR-free document understanding Transformer. Proc 17^th European Conf on Computer Vision, p.498-517.

[11]Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. Proc 5^th Int Conf on Learning Representations.

[12]Li PJ, Shi SM, 2021. Tail-to-tail non-autoregressive sequence prediction for Chinese grammatical error correction. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.4973-4984.

[13]Li YH, Zhou QY, Li YN, et al., 2022. The past mistake is the future wisdom: error-driven contrastive probability optimization for Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.3202-3213.

[14]Liang ZH, Quan XJ, Wang QF, 2023. Disentangled phonetic representation for Chinese spelling correction. Proc 61^st Annual Meeting of the Association for Computational Linguistics, p.13509-13521.

[15]Lin C, Miller T, Dligach D, et al., 2019. A BERT-based universal model for both within- and cross-sentence clinical temporal relation extraction. Proc 2^nd Clinical Natural Language Processing Workshop, p.65-71.

[16]Liu SL, Yang T, Yue TC, et al., 2021. PLOME: pre-training with misspelled knowledge for Chinese spelling correction. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.2991-3000.

[17]Lv Q, Cao ZQ, Geng L, et al., 2023. General and domain-adaptive Chinese spelling check with error-consistent pretraining. ACM Trans Asian Low-Resour Lang Inform Process, 22(5):124.

[18]Ma CS, Hu M, Peng JJ, et al., 2023. Improving Chinese spell checking with bidirectional LSTMs and confusionset-based decision network. Neur Comput Appl, 35(21):15679-15692.

[19]Shen J, Pang RM, Weiss RJ, et al., 2018. Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.4779-4783.

[20]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition.

[21]State Administration for Market Regulation (SAMR), Standardization Administration of the People’s Republic of China (SAC), 2022. Information Technology - Chinese Coded Character Set, GB 18030-2022. National Standards of People’s Republic of China (in Chinese).

[22]Sun ZJ, Li XY, Sun XF, et al., 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.2065-2075.

[23]Tseng YH, Lee LH, Chang LP, et al., 2015. Introduction to SIGHAN 2015 Bake-off for Chinese spelling check. Proc 8^th SIGHAN Workshop on Chinese Language Processing, p.32-37.

[24]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31^st Int Conf on Neural Information Processing Systems, p.6000-6010.

[25]Wang DM, Song Y, Li J, et al., 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. Proc Conf on Empirical Methods in Natural Language Processing, p.2517-2527.

[26]Weigang L, Marinho MC, Li DL, et al., 2024. Six-writings multimodal processing with pictophonetic coding to enhance Chinese language models. Front Inform Technol Electron Eng, 25(1):84-105.

[27]Wu SH, Liu CL, Lee LH, 2013. Chinese spelling check evaluation at SIGHAN Bake-off 2013. Proc 7^th SIGHAN Workshop on Chinese Language Processing, p.35-42.

[28]Xie ZK, Sato I, Sugiyama M, 2020. Stable weight decay regularization. https://arxiv.org/abs/2011.11152v2

[29]Xu HD, Li ZL, Zhou QY, et al., 2021. Read, listen, and see: leveraging multimodal information helps Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.716-728.

[30]Yang HY, 2023. Block the label and noise: an n-gram masked speller for Chinese spell checking. https://arxiv.org/abs/2305.03314

[31]Yang SJ, Yu L, 2022. CoSPA: an improved masked language model with copy mechanism for Chinese spelling correction. Proc 38^th Conf on Uncertainty in Artificial Intelligence, p.2225-2234.

[32]Yang W, Xie YQ, Lin A, et al., 2019. End-to-end open-domain question answering with BERTserini. Proc Conf of the North American Chapter of the Association for Computational Linguistics, p.72-77.

[33]Yu LC, Lee LH, Tseng YH, et al., 2014. Overview of SIGHAN 2014 bake-off for Chinese spelling check. Proc 3^rd CIPS-SIGHAN Joint Conf on Chinese Language Processing, p.126-132.

[34]Zhang D, Li YH, Zhou QY, et al., 2023. Contextual similarity is more valuable than character similarity: an empirical study for Chinese spell checking. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1-5.

[35]Zhang RQ, Pang C, Zhang CQ, et al., 2021. Correcting Chinese spelling errors with phonetic pre-training. Proc Findings of the Association for Computational Linguistics, p.2250-2261.

[36]Zhang SH, Huang HR, Liu JC, et al., 2020. Spelling error correction with soft-masked BERT. Proc 58^th Annual Meeting of the Association for Computational Linguistics, p.882-890.

[37]Zhu CX, Ying ZQ, Zhang BY, et al., 2022. MDCSpell: a multi-task detector-corrector framework for Chinese spelling correction. Proc Findings of the Association for Computational Linguistics, p.1244-1253.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

DRMSpell：中文拼写纠正中的动态多模态重新加权技术

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference