Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Words alignment based on association rules for cross-domain sentiment classification

Abstract: Automatic classification of sentiment data (e.g., reviews, blogs) has many applications in enterprise user management systems, and can help us understand people&x2019;s attitudes about products or services. However, it is difficult to train an accurate sentiment classifier for different domains. One of the major reasons is that people often use different words to express the same sentiment in different domains, and we cannot easily find a direct mapping relationship between them to reduce the differences between domains. So, the accuracy of the sentiment classifier will decline sharply when we apply a classifier trained in one domain to other domains. In this paper, we propose a novel approach called words alignment based on association rules (WAAR) for cross-domain sentiment classification, which can establish an indirect mapping relationship between domain-specific words in different domains by learning the strong association rules between domain-shared words and domain-specific words in the same domain. In this way, the differences between the source domain and target domain can be reduced to some extent, and a more accurate cross-domain classifier can be trained. Experimental results on Amazon® datasets show the effectiveness of our approach on improving the performance of cross-domain sentiment classification.

Key words: Sentiment classification, Cross-domain, Association rules

Chinese Summary  <21> 基于关联规则进行词对齐的跨领域情感分类算法

概要:文本情感分类被应用于企业用户管理系统,通过自动对诸如评论、博客等带有情感倾向性文字进行分析,帮助商家更好地了解用户对商品或者服务的态度。然而,评论和博客等内容常源于不同应用领域,为每个领域训练一个能准确预测情感倾向的分类器非常困难。主要原因是,在不同领域,人们通常会用不同特征词表达相同情感,并且难以找到一个直接的映射函数,以建立不同领域特征词间的映射关系,从而消除领域间差异。因此,将某个领域训练好的分类器直接应用到另一个领域时,会因为领域间差异使得分类器准确率急速下降。本文提出一个新的基于关联规则进行特征词对齐的跨领域情感分类算法,该算法通过在同一领域中挖掘具有强关联关系的领域共享词和领域专有词词对,建立直接映射关系,并以领域共享词为桥梁,在不同领域的特征专有词之间建立间接映射关系,从而在一定程度上消除了源领域和目标领域之间的差异,有效提升了跨领域情感分类准确率。在亚马逊数据库上的实验结果证明该算法提高了跨领域情感分类性能。

关键词组:情感分类;跨领域;关联规则


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601679

CLC number:

TP391.1

Download Full Text:

Click Here

Downloaded:

1927

Download summary:

<Click Here> 

Downloaded:

1332

Clicked:

5824

Cited:

0

On-line Access:

2018-04-09

Received:

2016-11-02

Revision Accepted:

2017-03-09

Crosschecked:

2018-02-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE