CLC number: TP391.4
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2020-07-20
Cited: 0
Clicked: 5736
Citations: Bibtex RefMan EndNote GB/T7714
Luo-yang Xue, Qi-rong Mao, Xiao-hua Huang, Jie Chen. NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(9): 1321-1333.
@article{title="NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images",
author="Luo-yang Xue, Qi-rong Mao, Xiao-hua Huang, Jie Chen",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="9",
pages="1321-1333",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900618"
}
%0 Journal Article
%T NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images
%A Luo-yang Xue
%A Qi-rong Mao
%A Xiao-hua Huang
%A Jie Chen
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 9
%P 1321-1333
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900618
TY - JOUR
T1 - NLWSNet: a weakly supervised network for visual sentiment analysis in mislabeled web images
A1 - Luo-yang Xue
A1 - Qi-rong Mao
A1 - Xiao-hua Huang
A1 - Jie Chen
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 9
SP - 1321
EP - 1333
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900618
Abstract: Large-scale datasets are driving the rapid developments of deep convolutional neural networks for visual sentiment analysis. However, the annotation of large-scale datasets is expensive and time consuming. Instead, it is easy to obtain weakly labeled web images from the Internet. However, noisy labels still lead to seriously degraded performance when we use images directly from the web for training networks. To address this drawback, we propose an end-to-end weakly supervised learning network, which is robust to mislabeled web images. Specifically, the proposed attention module automatically eliminates the distraction of those samples with incorrect labels by reducing their attention scores in the training process. On the other hand, the special-class activation map module is designed to stimulate the network by focusing on the significant regions from the samples with correct labels in a weakly supervised learning approach. Besides the process of feature learning, applying regularization to the classifier is considered to minimize the distance of those samples within the same class and maximize the distance between different class centroids. Quantitative and qualitative evaluations on well- and mislabeled web image datasets demonstrate that the proposed algorithm outperforms the related methods.
[1]Borth D, Ji RR, Chen T, et al., 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proc 21st ACM Int Conf on Multimedia, p.223-232.
[2]Campos V, Salvador A, Giró-i-Nieto X, et al., 2015. Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. https://arxiv.org/abs/1508.05056
[3]Campos V, Jou B, Giró-i-Nieto X, 2017. From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis Comput, 65:15-22.
[4]Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3):27.
[5]Chen L, Zhang HW, Xiao J, et al., 2017. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6298-6306.
[6]Chen SX, Zhang CJ, Dong M, et al., 2017. Using ranking-CNN for age estimation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.742-751.
[7]Chen SX, Zhang CJ, Dong M, 2018a. Coupled end-to-end transfer learning with generalized Fisher information. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4329-4338.
[8]Chen SX, Zhang CJ, Dong M, 2018b. Deep age estimation: from classification to ranking. IEEE Trans Multim, 20(8):2209-2222.
[9]Chen T, Borth D, Darrell T, et al., 2014a. DeepSentiBank: visual sentiment concept classification with deep convolutional neural networks. https://arxiv.org/abs/1410.8586
[10]Chen T, Yu FX, Chen JW, et al., 2014b. Object-based visual sentiment concept analysis and application. Proc 22nd ACM Int Conf on Multimedia, p.367-376.
[11]Corbetta M, Shulman GL, 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci, 3(3):201-205.
[12]Deng J, Dong W, Socher R, et al., 2009. ImageNet: a large-scale hierarchical image database. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.248-255.
[13]Durand T, Mordan T, Thome N, et al., 2017. WILDCAT: weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5957-5966.
[14]Fang Y, Tan H, Zhang J, 2018. Multi-strategy sentiment analysis of consumer reviews based on semantic fuzziness. IEEE Access, 6:20625-20631.
[15]Girshick R, Donahue J, Darrell T, et al., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.580-587.
[16]He XT, Peng YX, 2017. Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. Proc 21st AAAI Conf on Artificial Intelligence, p.4075-4081.
[17]He XT, Peng YX, Zhao JJ, 2019. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394-1407.
[18]Hinton GE, 2008. Visualizing high-dimensional data using t-SNE. Vigil Christ, 9(2):2579-2605.
[19]Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7132-7141.
[20]Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2261-2269.
[21]Itti L, Koch C, Niebur E, 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell, 20(11):1254-1259.
[22]Jia XB, Jin Y, Li N, et al., 2018. Words alignment based on association rules for cross-domain sentiment classification. Front Inform Technol Electron Eng, 19(2):260-272.
[23]Katsurai M, Satoh S, 2016. Image sentiment analysis using latent correlations among visual, textual, and sentiment views. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2837-2841.
[24]Krizhevsky A, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009, University of Toronto, Toronto, Canada.
[25]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.
[26]LeCun Y, Boser B, Denker JS, et al., 1989. Backpropagation applied to handwritten zip code recognition. Neur Comput, 1(4):541-551.
[27]Li ZH, Fan YY, Liu WH, et al., 2018. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multim Tools Appl, 77(1):1115-1132.
[28]Liu GL, Reda FA, Shih KJ, et al., 2018. Image inpainting for irregular holes using partial convolutions. Proc 15th European Conf on Computer Vision, p.89-105.
[29]Machajdik J, Hanbury A, 2010. Affective image classification using features inspired by psychology and art theory. Proc 18th ACM Int Conf on Multimedia, p.83-92.
[30]Mikels JA, Fredrickson BL, Larkin GR, et al., 2005. Emotional category data on images from the international affective picture system. Behav Res Methods, 37(4):626-630.
[31]Oquab M, Bottou L, Laptev I, et al., 2015. Is object localization for free?—Weakly-supervised learning with convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.685-694.
[32]Ou WH, Luan X, Gou JP, et al., 2018. Robust discriminative nonnegative dictionary learning for occluded face recognition. Patt Recogn Lett, 107:41-49.
[33]Park J, Woo S, Lee JY, et al., 2018. BAM: bottleneck attention module. Proc British Machine Vision Conf, Article 147.
[34]Peng KC, Sadovnik A, Gallagher A, et al., 2016. Where do emotions come from? Predicting the emotion stimuli map. Proc IEEE Int Conf on Image Processing, p.614-618.
[35]Peng YX, Qi JW, Zhuo YK, 2019a. MAVA: multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism. IEEE Trans Image Process, 29:2728-2741.
[36]Peng YX, Zhao YZ, Zhang JC, 2019b. Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circ Syst Video Technol, 29(3):773-786.
[37]Rohrbach A, Rohrbach M, Hu RH, et al., 2016. Grounding of textual phrases in images by reconstruction. Proc 14th European Conf on Computer Vision, p.817-834.
[38]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
[39]Sun M, Yang JF, Wang K, et al., 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. Proc IEEE Int Conf on Multimedia and Expo, p.1-6.
[40]Szegedy C, Vanhoucke V, Ioffe S, et al., 2016. Rethinking the inception architecture for computer vision. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2818-2826.
[41]Wang F, Jiang MQ, Qian C, et al., 2017. Residual attention network for image classification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6450-6458.
[42]Woo S, Park J, Lee JY, et al., 2018. CBAM: convolutional block attention module. Proc 15th European Conf on Computer Vision, p.3-19.
[43]Xiao FY, Sigal L, Lee YJ, 2017. Weakly-supervised visual grounding of phrases with linguistic structures. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5253-5262.
[44]Yang JF, She DY, Sun M, 2017a. Joint image emotion classification and distribution learning via deep convolutional neural network. Proc 26th Int Joint Conf on Artificial Intelligence, p.3266-3272.
[45]Yang JF, Sun M, Sun XX, 2017b. Learning visual sentiment distributions via augmented conditional probability neural network. Proc 31st AAAI Conf on Artificial Intelligence, p.224-230.
[46]Yang JF, She DY, Sun M, et al., 2018a. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans Multim, 20(9):2513-2525.
[47]Yang JF, She DY, Lai YK, et al., 2018b. Weakly supervised coupled networks for visual sentiment analysis. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.7584-7592.
[48]You QZ, Luo JB, Jin HL, et al., 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. Proc 29th AAAI Conf on Artificial Intelligence, p.381-388.
[49]You QZ, Luo JB, Jin HL, et al., 2016. Building a large scale dataset for image emotion recognition: the fine print and the benchmark. Proc 30th AAAI Conf on Artificial Intelligence, p.308-314.
[50]You QZ, Jin HL, Luo JB, 2017. Visual sentiment analysis by attending on local image regions. Proc 31st AAAI Conf on Artificial Intelligence, p.231-237.
[51]Yu JH, Lin Z, Yang JM, et al., 2018. Generative image inpainting with contextual attention. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5505-5514.
[52]Yuan JB, Mcdonough S, You QZ, et al., 2013. Sentribute: image sentiment analysis from a mid-level perspective. Proc 2nd Int Workshop on Issues of Sentiment Discovery and Opinion Mining, p.1-8.
[53]Zagoruyko S, Komodakis N, 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. https://arxiv.org/abs/1612.03928
[54]Zeng SN, Gou JP, Yang X, 2018. Improving sparsity of coefficients for robust sparse and collaborative representation-based image classification. Neur Comput Appl, 30(10):2965-2978.
[55]Zhang FF, Mao QR, Shen XJ, et al., 2018a. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multim Comput Commun Appl, 14(1s):27.
[56]Zhang FF, Zhang TZ, Mao QR, et al., 2018b. Joint pose and expression modeling for facial expression recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3359-3368.
[57]Zhang N, Donahue J, Girshick R, et al., 2014. Part-based R-CNNs for fine-grained category detection. Proc 13th European Conf on Computer Vision, p.834-849.
[58]Zhang QS, Zhu SC, 2018. Visual interpretability for deep learning: a survey. Front Inform Technol Electron Eng, 19(1):27-39.
[59]Zhao SC, Gao Y, Jiang XL, et al., 2014. Exploring principles-of-art features for image emotion recognition. Proc 22nd ACM Int Conf on Multimedia, p.47-56.
[60]Zhao SC, Yao HX, Gao Y, et al., 2016. Predicting personalized emotion perceptions of social images. Proc 24th ACM Int Conf on Multimedia, p.1385-1394.
[61]Zhao SC, Ding GG, Gao Y, et al., 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. Proc 26th Int Joint Conf on Artificial Intelligence, p.4669-4675.
[62]Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921-2929.
[63]Zhu Y, Zhou YZ, Ye QX, et al., 2017. Soft proposal networks for weakly supervised object localization. Proc IEEE Int Conf on Computer Vision, p.1859-1868.
[64]Zhu YK, Groth O, Bernstein M, et al., 2016. Visual7W: grounded question answering in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4995-5004.
[65]Zhuang BH, Liu LQ, Li Y, et al., 2017. Attend in groups: a weakly-supervised deep learning framework for learning from web data. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2915-2924.
Open peer comments: Debate/Discuss/Question/Opinion
<1>