Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Syntactic word embedding based on dependency syntax and polysemous analysis

Abstract: Most word embedding models have the following problems: (1) In the models based on bag-of-words contexts, the structural relations of sentences are completely neglected; (2) Each word uses a single embedding, which makes the model indiscriminative for polysemous words; (3) Word embedding easily tends to contextual structure similarity of sentences. To solve these problems, we propose an easy-to-use representation algorithm of syntactic word embedding (SWE). The main procedures are: (1) A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation (LDA) algorithm; (2) Symbols ‘+’ and ‘−’ are adopted to indicate the directions of the dependency syntax; (3) Stopwords and their dependencies are deleted; (4) Dependency skip is applied to connect indirect dependencies; (5) Dependency-based contexts are inputted to a word2vec model. Experimental results show that our model generates desirable word embedding in similarity evaluation tasks. Besides, semantic and syntactic features can be captured from dependency-based syntactic contexts, exhibiting less topical and more syntactic similarity. We conclude that SWE outperforms single embedding learning models.

Key words: Dependency-based context, Polysemous word representation, Representation learning, Syntactic word embedding

Chinese Summary  <28> 基于依存关系和多义词分析的句法词嵌入

摘要:现有大多数词嵌入学习模型存在以下问题:(1)基于词袋上下文的模型完全忽略句子的句法结构关系;(2)每个词使用单个嵌入向量使多义词共享一个嵌入向量;(3)词嵌入往往趋向于句子上下文共性。为解决这些问题,提出一种基于依存关系和多义词分析的句法词嵌入(syntactic word embedding, SWE)。该算法主要处理:(1)基于主题模型,提出一个多义词识别算法;(2)采用符号"+"和"?"表示依存关系方向;(3)删除停用词及其依存关系;(4)引入"skip"依存关系表示依存关系之间的间接关系;(5)将基于依存关系的上下文输入到Word2Vec模型中训练语言模型。实验结果表明,SWE模型在词相似度评测任务中表现出优异性能。基于依存关系句法上下文捕获词语的语义和句法特征,使词语表现出较少的上下文主题相似性和更多的句法和语义相似性。综上,包含更多信息的SWE模型性能优于单一的词嵌入学习模型。

关键词组:基于依存关系的上下文;多义词表示;表示学习;句法词向量


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601846

CLC number:

TP391

Download Full Text:

Click Here

Downloaded:

3059

Download summary:

<Click Here> 

Downloaded:

1724

Clicked:

7112

Cited:

0

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2018-04-12

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE