JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2025 Vol.26 No.3 P.354-366

DRMSpell: dynamically reweighting multimodality for Chinese spelling correction

Yinghao LI, Heyan HUANG, Baojun WANG, Yang GAO

School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Southeast Academy of Information Technology, Beijing Institute of Technology, Putian 351100, China; Huawei Noahs Ark Lab, Shenzhen 518129, China

yhli@bit.edu.cn, hhy63@bit.edu.cn, puking.w@huawei.com, gyang@bit.edu.cn

Abstract: Chinese spelling correction (CSC) is a task that aims to detect and correct the spelling errors that may occur in Chinese texts. However, the Chinese language exhibits a high degree of complexity, characterized by the presence of multiple phonetic representations known as pinyin, which possess distinct tonal variations that can correspond to various characters. Given the complexity inherent in the Chinese language, the CSC task becomes imperative for ensuring the accuracy and clarity of written communication. Recent research has included external knowledge into the model using phonological and visual modalities. However, these methods do not effectively target the utilization of modality information to address the different types of errors. In this paper, we propose a multimodal pretrained language model called DRMSpell for CSC, which takes into consideration the interaction between the modalities. A dynamically reweighting multimodality (DRM) module is introduced to reweight various modalities for obtaining more multimodal information. To fully use the multimodal information obtained and to further strengthen the model, an independent-modality masking strategy (IMS) is proposed to independently mask three modalities of a token in the pretraining stage. Our method achieves state-of-the-art performance on most metrics constituting widely used benchmarks. The findings of the experiments demonstrate that our method is capable of modeling the interactive information between modalities and is also robust to incorrect modal information.

Key words: Chinese spelling correction; Multimodality; Masking strategy

Chinese Summary <19> DRMSpell：中文拼写纠正中的动态多模态重新加权技术

李英豪¹，黄河燕^1,2，王宝军³，高扬^1,2
¹北京理工大学计算机学院，中国北京市，100081
²北京理工大学东南信息技术研究院，中国莆田市，351100
³华为诺亚方舟实验室，中国深圳市，518129
摘要：中文拼写纠正任务旨在检测和纠正中文文本中可能出现的拼写错误。但中文表现出高度的复杂性，其特点是存在多种声调变化的拼音表示，这些声调变化可以对应不同的字符。鉴于中文语言的这种复杂性，中文拼写纠正任务对于确保书面交流的准确性和清晰度至关重要，最近的研究已经将外部知识通过语音和视觉模态引入模型中。然而，这些方法未能有效地利用模态信息来针对性地解决不同类型的拼写错误。在本文中我们提出一个名为DRMSpell的多模态预训练语言模型以用于中文拼写纠正，该模型考虑了模态之间的交互作用。我们引入一个动态多模态重新加权模块，用于重新加权各种模态以获取更多的多模态信息。为充分利用所获得的多模态信息并进一步加强模型，我们提出一个独立模态掩码策略，在预训练阶段独立掩蔽一个词元的三种模态。我们的方法在大多数广泛使用的基准测试指标上实现了最先进的性能，实验结果表明，我们的方法能够建模模态之间的交互信息，即使对错误模态信息也具有鲁棒性。

关键词组：中文拼写纠正；多模态；掩码策略

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2300816

CLC number:

TP391.1

Download Full Text:

Click Here

Downloaded:

3977

Download summary:

Downloaded:

579

Clicked:

2415

Cited:

On-line Access:

2025-04-03

Received:

2023-12-01

Revision Accepted:

2024-05-06

Crosschecked:

2025-04-07

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service