Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2020 Vol.21 No.9 P.1321-1333

http://doi.org/10.1631/FITEE.1900618

NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images

Author(s): Luo-yang Xue, Qi-rong Mao, Xiao-hua Huang, Jie Chen
Affiliation(s): 1. Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China more
Corresponding email(s): mao_qr@ujs.edu.cn
Key Words: Visual sentiment analysis, Weakly supervised learning, Mislabeled samples, Significant sentiment regions

Share this article to： More <<< Previous Article \|Next Article >>>

Luo-yang Xue, Qi-rong Mao, Xiao-hua Huang, Jie Chen. NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(9): 1321-1333.

@article{title="NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images",
author="Luo-yang Xue, Qi-rong Mao, Xiao-hua Huang, Jie Chen",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="9",
pages="1321-1333",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900618"
}

%0 Journal Article
%T NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images
%A Luo-yang Xue
%A Qi-rong Mao
%A Xiao-hua Huang
%A Jie Chen
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 9
%P 1321-1333
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900618

TY - JOUR
T1 - NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images
A1 - Luo-yang Xue
A1 - Qi-rong Mao
A1 - Xiao-hua Huang
A1 - Jie Chen
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 9
SP - 1321
EP - 1333
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900618

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis. However, the annotation of large-scale datasets is expensive and time consuming. Instead, it is easy to obtain weakly labeled web images from the Internet. However, noisy labels still lead to seriously degraded performance when we use images directly from the web for training networks. To address this drawback, we propose an end-to-end weakly supervised learning network, which is robust to mislabeled web images. Specifically, the proposed attention module automatically eliminates the distraction of those samples with incorrect labels by reducing their attention scores in the training process. On the other hand, the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach. Besides the process of feature learning, applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids. Quantitative and qualitative evaluations on well- and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.

NLWSNet：基于弱监督学习的嘈杂标签Web图像情感分析

薛罗阳¹，毛启容^1,2，黄晓华^3,4，陈婕¹
¹江苏大学计算机科学与通信工程学院，中国镇江市，212013
²江苏省工业网络安全技术重点实验室，中国镇江市，212013
³南京工程学院计算机工程学院，中国南京市，211167
⁴奥卢大学机器视觉和信号分析中心，芬兰奥卢市，8000

摘要：大规模数据集推动了基于深度卷积神经网络情感分析的快速发展。但是，注释大规模数据集既昂贵又耗时。相反，从网络中很容易获得弱标注的Web图像。当直接使用Web图像训练深度学习模型时，嘈杂标签会导致性能急剧下降。针对这种情况，提出一种端到端的弱监督学习结构，该结构对于弱标签的Web图像具有鲁棒性。具体地说，该注意力模块通过降低训练过程中注意力得分，自动抑制带有错误标签样本的负面影响。另外，在弱监督学习方法中，类激活图模块通过关注正确标签样本的情感区域促进网络学习。除特征学习过程外，将正则化应用于分类器，以最小化同一类别样本的距离，并最大化不同类别样本质心之间的距离。对标记正确和错误的网页图像数据集进行定量和定性评估，结果表明该算法优于现有方法。

关键词：图像情感分析；弱监督学习；嘈杂标签样本；显著情感区域

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Borth D, Ji RR, Chen T, et al., 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proc 21^st ACM Int Conf on Multimedia, p.223-232.

[2]Campos V, Salvador A, Giró-i-Nieto X, et al., 2015. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. https://arxiv.org/abs/1508.05056

[3]Campos V, Jou B, Giró-i-Nieto X, 2017. From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis Comput, 65:15-22.

[4]Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3):27.

[5]Chen L, Zhang HW, Xiao J, et al., 2017. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6298-6306.

[6]Chen SX, Zhang CJ, Dong M, et al., 2017. Using ranking-CNN for age estimation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.742-751.

[7]Chen SX, Zhang CJ, Dong M, 2018a. Coupled end-to-end transfer learning with generalized Fisher information. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4329-4338.

[8]Chen SX, Zhang CJ, Dong M, 2018b. Deep age estimation: from classification to ranking. IEEE Trans Multim, 20(8):2209-2222.

[9]Chen T, Borth D, Darrell T, et al., 2014a. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. https://arxiv.org/abs/1410.8586

[10]Chen T, Yu FX, Chen JW, et al., 2014b. Object-based visual sentiment concept analysis and application. Proc 22^nd ACM Int Conf on Multimedia, p.367-376.

[11]Corbetta M, Shulman GL, 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci, 3(3):201-205.

[12]Deng J, Dong W, Socher R, et al., 2009. ImageNet: a large-scale hierarchical image database. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.248-255.

[13]Durand T, Mordan T, Thome N, et al., 2017. WILDCAT: weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5957-5966.

[14]Fang Y, Tan H, Zhang J, 2018. Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access, 6:20625-20631.

[15]Girshick R, Donahue J, Darrell T, et al., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.580-587.

[16]He XT, Peng YX, 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. Proc 21^st AAAI Conf on Artificial Intelligence, p.4075-4081.

[17]He XT, Peng YX, Zhao JJ, 2019. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394-1407.

[18]Hinton GE, 2008. Visualizing high-dimensional data using t-SNE. Vigil Christ, 9(2):2579-2605.

[19]Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7132-7141.

[20]Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2261-2269.

[21]Itti L, Koch C, Niebur E, 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell, 20(11):1254-1259.

[22]Jia XB, Jin Y, Li N, et al., 2018. Words alignment based on association rules for cross-domain sentiment classification. Front Inform Technol Electron Eng, 19(2):260-272.

[23]Katsurai M, Satoh S, 2016. Image sentiment analysis using latent correlations among visual, textual, and sentiment views. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2837-2841.

[24]Krizhevsky A, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009, University of Toronto, Toronto, Canada.

[25]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.

[26]LeCun Y, Boser B, Denker JS, et al., 1989. Backpropagation applied to handwritten zip code recognition. Neur Comput, 1(4):541-551.

[27]Li ZH, Fan YY, Liu WH, et al., 2018. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multim Tools Appl, 77(1):1115-1132.

[28]Liu GL, Reda FA, Shih KJ, et al., 2018. Image inpainting for irregular holes using partial convolutions. Proc 15^th European Conf on Computer Vision, p.89-105.

[29]Machajdik J, Hanbury A, 2010. Affective image classification using features inspired by psychology and art theory. Proc 18^th ACM Int Conf on Multimedia, p.83-92.

[30]Mikels JA, Fredrickson BL, Larkin GR, et al., 2005. Emotional category data on images from the international affective picture system. Behav Res Methods, 37(4):626-630.

[31]Oquab M, Bottou L, Laptev I, et al., 2015. Is object localization for free?—Weakly-supervised learning with convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.685-694.

[32]Ou WH, Luan X, Gou JP, et al., 2018. Robust discriminative nonnegative dictionary learning for occluded face recognition. Patt Recogn Lett, 107:41-49.

[33]Park J, Woo S, Lee JY, et al., 2018. BAM: bottleneck attention module. Proc British Machine Vision Conf, Article 147.

[34]Peng KC, Sadovnik A, Gallagher A, et al., 2016. Where do emotions come from? Predicting the emotion stimuli map. Proc IEEE Int Conf on Image Processing, p.614-618.

[35]Peng YX, Qi JW, Zhuo YK, 2019a. MAVA: multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism. IEEE Trans Image Process, 29:2728-2741.

[36]Peng YX, Zhao YZ, Zhang JC, 2019b. Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circ Syst Video Technol, 29(3):773-786.

[37]Rohrbach A, Rohrbach M, Hu RH, et al., 2016. Grounding of textual phrases in images by reconstruction. Proc 14^th European Conf on Computer Vision, p.817-834.

[38]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

[39]Sun M, Yang JF, Wang K, et al., 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. Proc IEEE Int Conf on Multimedia and Expo, p.1-6.

[40]Szegedy C, Vanhoucke V, Ioffe S, et al., 2016. Rethinking the inception architecture for computer vision. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2818-2826.

[41]Wang F, Jiang MQ, Qian C, et al., 2017. Residual attention network for image classification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6450-6458.

[42]Woo S, Park J, Lee JY, et al., 2018. CBAM: convolutional block attention module. Proc 15^th European Conf on Computer Vision, p.3-19.

[43]Xiao FY, Sigal L, Lee YJ, 2017. Weakly-supervised visual grounding of phrases with linguistic structures. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5253-5262.

[44]Yang JF, She DY, Sun M, 2017a. Joint image emotion classification and distribution learning via deep convolutional neural network. Proc 26^th Int Joint Conf on Artificial Intelligence, p.3266-3272.

[45]Yang JF, Sun M, Sun XX, 2017b. Learning visual sentiment distributions via augmented conditional probability neural network. Proc 31^st AAAI Conf on Artificial Intelligence, p.224-230.

[46]Yang JF, She DY, Sun M, et al., 2018a. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multim, 20(9):2513-2525.

[47]Yang JF, She DY, Lai YK, et al., 2018b. Weakly supervised coupled networks for visual sentiment analysis. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7584-7592.

[48]You QZ, Luo JB, Jin HL, et al., 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. Proc 29^th AAAI Conf on Artificial Intelligence, p.381-388.

[49]You QZ, Luo JB, Jin HL, et al., 2016. Building a large scale dataset for image emotion recognition: the fine print and the benchmark. Proc 30^th AAAI Conf on Artificial Intelligence, p.308-314.

[50]You QZ, Jin HL, Luo JB, 2017. Visual sentiment analysis by attending on local image regions. Proc 31^st AAAI Conf on Artificial Intelligence, p.231-237.

[51]Yu JH, Lin Z, Yang JM, et al., 2018. Generative image inpainting with contextual attention. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5505-5514.

[52]Yuan JB, Mcdonough S, You QZ, et al., 2013. Sentribute: image sentiment analysis from a mid-level perspective. Proc 2^nd Int Workshop on Issues of Sentiment Discovery and Opinion Mining, p.1-8.

[53]Zagoruyko S, Komodakis N, 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. https://arxiv.org/abs/1612.03928

[54]Zeng SN, Gou JP, Yang X, 2018. Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neur Comput Appl, 30(10):2965-2978.

[55]Zhang FF, Mao QR, Shen XJ, et al., 2018a. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multim Comput Commun Appl, 14(1s):27.

[56]Zhang FF, Zhang TZ, Mao QR, et al., 2018b. Joint pose and expression modeling for facial expression recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3359-3368.

[57]Zhang N, Donahue J, Girshick R, et al., 2014. Part-based R-CNNs for fine-grained category detection. Proc 13^th European Conf on Computer Vision, p.834-849.

[58]Zhang QS, Zhu SC, 2018. Visual interpretability for deep learning: a survey. Front Inform Technol Electron Eng, 19(1):27-39.

[59]Zhao SC, Gao Y, Jiang XL, et al., 2014. Exploring principles-of-art features for image emotion recognition. Proc 22^nd ACM Int Conf on Multimedia, p.47-56.

[60]Zhao SC, Yao HX, Gao Y, et al., 2016. Predicting personalized emotion perceptions of social images. Proc 24^th ACM Int Conf on Multimedia, p.1385-1394.

[61]Zhao SC, Ding GG, Gao Y, et al., 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. Proc 26^th Int Joint Conf on Artificial Intelligence, p.4669-4675.

[62]Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921-2929.

[63]Zhu Y, Zhou YZ, Ye QX, et al., 2017. Soft proposal networks for weakly supervised object localization. Proc IEEE Int Conf on Computer Vision, p.1859-1868.

[64]Zhu YK, Groth O, Bernstein M, et al., 2016. Visual7W: grounded question answering in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4995-5004.

[65]Zhuang BH, Liu LQ, Li Y, et al., 2017. Attend in groups: a weakly-supervised deep learning framework for learning from web data. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2915-2924.

Open peer comments: Debate/Discuss/Question/Opinion

<1>