Full Text:   <3>

CLC number: 

On-line Access: 2024-12-16

Received: 2004-06-11

Revision Accepted: 2024-10-10

Crosschecked: 0000-00-00

Cited: 0

Clicked: 6

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2400504


Shared-weightmultimodal translation model for recognizingChinese variant characters


Author(s):  Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG

Affiliation(s):  School of Computer Science and Engineering, Southeast University, Nanjing 210000, China; more

Corresponding email(s):   syk@seu.edu.cn, libing@seu.edu.cn, lexiangli@seu.edu.cn, pengyang@seu.edu.cn

Key Words:  Chinese variant characters, Multimodal model, Translation model, Phonology and morphology


Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG. Shared-weightmultimodal translation model for recognizingChinese variant characters[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Shared-weightmultimodal translation model for recognizingChinese variant characters",
author="Yuankang SUN, Bing LI, Lexiang LI, Peng YANG, Dongmei YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400504"
}

%0 Journal Article
%T Shared-weightmultimodal translation model for recognizingChinese variant characters
%A Yuankang SUN
%A Bing LI
%A Lexiang LI
%A Peng YANG
%A Dongmei YANG
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400504

TY - JOUR
T1 - Shared-weightmultimodal translation model for recognizingChinese variant characters
A1 - Yuankang SUN
A1 - Bing LI
A1 - Lexiang LI
A1 - Peng YANG
A1 - Dongmei YANG
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400504


Abstract: 
The task of recognizing chinese variant characters aims to address the challenges of semantic ambiguity and confusion, which potentially cause risks to the security of Web content and complicate the governance of sensitive words. Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining, often overlooking the inherent phonological and morphological characteristics of the Chinese language. To address these issues, we propose a shared-weight multimodal translation model (SMTM) based on multimodal information of Chinese characters, which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant texts. Specifically, we encode the Pinyin features of Chinese characters using the embedding layer, and the font features of Chinese characters are extracted based on convolutional neural networks directly. Considering the multimodal similarity between the source and the target sentences of the Chinese variant-character-recognition task, we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process. The experimental results show that our proposed SMTM model achieves remarkable performance of 89.550% and 79.480% on bilingual evaluation understudy-1 (BLEU1) and F1 metrics, respectively, which is a significant improvement of 4.344% and 3.088%, respectively, compared with the state-of-the-art baseline model.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE