CLC number: TP391.1
On-line Access: 2025-04-03
Received: 2023-12-01
Revision Accepted: 2024-05-06
Crosschecked: 2025-04-07
Cited: 0
Clicked: 1131
Citations: Bibtex RefMan EndNote GB/T7714
Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO. DRMSpell: dynamically reweighting multimodality for Chinese spelling correction[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(3): 354-366.
@article{title="DRMSpell: dynamically reweighting multimodality for Chinese spelling correction",
author="Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="3",
pages="354-366",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300816"
}
%0 Journal Article
%T DRMSpell: dynamically reweighting multimodality for Chinese spelling correction
%A Yinghao LI
%A Heyan HUANG
%A Baojun WANG
%A Yang GAO
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 3
%P 354-366
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300816
TY - JOUR
T1 - DRMSpell: dynamically reweighting multimodality for Chinese spelling correction
A1 - Yinghao LI
A1 - Heyan HUANG
A1 - Baojun WANG
A1 - Yang GAO
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 3
SP - 354
EP - 366
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300816
Abstract: chinese spelling correction (CSC) is a task that aims to detect and correct the spelling errors that may occur in Chinese texts. However, the Chinese language exhibits a high degree of complexity, characterized by the presence of multiple phonetic representations known as pinyin, which possess distinct tonal variations that can correspond to various characters. Given the complexity inherent in the Chinese language, the CSC task becomes imperative for ensuring the accuracy and clarity of written communication. Recent research has included external knowledge into the model using phonological and visual modalities. However, these methods do not effectively target the utilization of modality information to address the different types of errors. In this paper, we propose a multimodal pretrained language model called DRMSpell for CSC, which takes into consideration the interaction between the modalities. A dynamically reweighting multimodality (DRM) module is introduced to reweight various modalities for obtaining more multimodal information. To fully use the multimodal information obtained and to further strengthen the model, an independent-modality masking strategy (IMS) is proposed to independently mask three modalities of a token in the pretraining stage. Our method achieves state-of-the-art performance on most metrics constituting widely used benchmarks. The findings of the experiments demonstrate that our method is capable of modeling the interactive information between modalities and is also robust to incorrect modal information.
[1]Bahdanau D, Cho K, Bengio Y, 2015. Neural machine translation by jointly learning to align and translate. Proc 3rd Int Conf on Learning Representations.
[2]Bhardwaj V, Ben Othman MT, Kukreja V, et al., 2022. Automatic speech recognition (ASR) systems for children: a systematic literature review. Appl Sci, 12(9):4419.
[3]Cheng XY, Xu WD, Chen KL, et al., 2020. SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.871-881.
[4]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional Transformers for language understanding. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.
[5]Guo Z, Ni Y, Wang KQ, et al., 2021. Global attention decoder for Chinese spelling error correction. Proc Findings of the Association for Computational Linguistics, p.1419-1428.
[6]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.
[7]Hong YZ, Yu XG, He N, et al., 2019. FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. Proc 5th Workshop on Noisy User-Generated Text, p.160-169.
[8]Huang L, Li JJ, Jiang WW, et al., 2021. PHMOSpell: phonological and morphological knowledge guided Chinese spelling check. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.5958-5967.
[9]Jin H, Zhang ZB, Yuan PP, 2022. Improving Chinese word representation using four corners features. IEEE Trans Big Data, 8(4):982-993.
[10]Kim G, Hong T, Yim M, et al., 2022. OCR-free document understanding Transformer. Proc 17th European Conf on Computer Vision, p.498-517.
[11]Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. Proc 5th Int Conf on Learning Representations.
[12]Li PJ, Shi SM, 2021. Tail-to-tail non-autoregressive sequence prediction for Chinese grammatical error correction. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.4973-4984.
[13]Li YH, Zhou QY, Li YN, et al., 2022. The past mistake is the future wisdom: error-driven contrastive probability optimization for Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.3202-3213.
[14]Liang ZH, Quan XJ, Wang QF, 2023. Disentangled phonetic representation for Chinese spelling correction. Proc 61st Annual Meeting of the Association for Computational Linguistics, p.13509-13521.
[15]Lin C, Miller T, Dligach D, et al., 2019. A BERT-based universal model for both within- and cross-sentence clinical temporal relation extraction. Proc 2nd Clinical Natural Language Processing Workshop, p.65-71.
[16]Liu SL, Yang T, Yue TC, et al., 2021. PLOME: pre-training with misspelled knowledge for Chinese spelling correction. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.2991-3000.
[17]Lv Q, Cao ZQ, Geng L, et al., 2023. General and domain-adaptive Chinese spelling check with error-consistent pretraining. ACM Trans Asian Low-Resour Lang Inform Process, 22(5):124.
[18]Ma CS, Hu M, Peng JJ, et al., 2023. Improving Chinese spell checking with bidirectional LSTMs and confusionset-based decision network. Neur Comput Appl, 35(21):15679-15692.
[19]Shen J, Pang RM, Weiss RJ, et al., 2018. Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.4779-4783.
[20]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition.
[21]State Administration for Market Regulation (SAMR), Standardization Administration of the People’s Republic of China (SAC), 2022. Information Technology - Chinese Coded Character Set, GB 18030-2022. National Standards of People’s Republic of China (in Chinese).
[22]Sun ZJ, Li XY, Sun XF, et al., 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.2065-2075.
[23]Tseng YH, Lee LH, Chang LP, et al., 2015. Introduction to SIGHAN 2015 Bake-off for Chinese spelling check. Proc 8th SIGHAN Workshop on Chinese Language Processing, p.32-37.
[24]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010.
[25]Wang DM, Song Y, Li J, et al., 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. Proc Conf on Empirical Methods in Natural Language Processing, p.2517-2527.
[26]Weigang L, Marinho MC, Li DL, et al., 2024. Six-writings multimodal processing with pictophonetic coding to enhance Chinese language models. Front Inform Technol Electron Eng, 25(1):84-105.
[27]Wu SH, Liu CL, Lee LH, 2013. Chinese spelling check evaluation at SIGHAN Bake-off 2013. Proc 7th SIGHAN Workshop on Chinese Language Processing, p.35-42.
[28]Xie ZK, Sato I, Sugiyama M, 2020. Stable weight decay regularization. https://arxiv.org/abs/2011.11152v2
[29]Xu HD, Li ZL, Zhou QY, et al., 2021. Read, listen, and see: leveraging multimodal information helps Chinese spell checking. Proc Findings of the Association for Computational Linguistics, p.716-728.
[30]Yang HY, 2023. Block the label and noise: an n-gram masked speller for Chinese spell checking. https://arxiv.org/abs/2305.03314
[31]Yang SJ, Yu L, 2022. CoSPA: an improved masked language model with copy mechanism for Chinese spelling correction. Proc 38th Conf on Uncertainty in Artificial Intelligence, p.2225-2234.
[32]Yang W, Xie YQ, Lin A, et al., 2019. End-to-end open-domain question answering with BERTserini. Proc Conf of the North American Chapter of the Association for Computational Linguistics, p.72-77.
[33]Yu LC, Lee LH, Tseng YH, et al., 2014. Overview of SIGHAN 2014 bake-off for Chinese spelling check. Proc 3rd CIPS-SIGHAN Joint Conf on Chinese Language Processing, p.126-132.
[34]Zhang D, Li YH, Zhou QY, et al., 2023. Contextual similarity is more valuable than character similarity: an empirical study for Chinese spell checking. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1-5.
[35]Zhang RQ, Pang C, Zhang CQ, et al., 2021. Correcting Chinese spelling errors with phonetic pre-training. Proc Findings of the Association for Computational Linguistics, p.2250-2261.
[36]Zhang SH, Huang HR, Liu JC, et al., 2020. Spelling error correction with soft-masked BERT. Proc 58th Annual Meeting of the Association for Computational Linguistics, p.882-890.
[37]Zhu CX, Ying ZQ, Zhang BY, et al., 2022. MDCSpell: a multi-task detector-corrector framework for Chinese spelling correction. Proc Findings of the Association for Computational Linguistics, p.1244-1253.
Open peer comments: Debate/Discuss/Question/Opinion
<1>