CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2020-05-18
Cited: 0
Clicked: 5084
Citations: Bibtex RefMan EndNote GB/T7714
Si-yue Yu, Jian Pu. Aggregated context network for crowd counting[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(11): 1626-1638.
@article{title="Aggregated context network for crowd counting",
author="Si-yue Yu, Jian Pu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="11",
pages="1626-1638",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900481"
}
%0 Journal Article
%T Aggregated context network for crowd counting
%A Si-yue Yu
%A Jian Pu
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 11
%P 1626-1638
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900481
TY - JOUR
T1 - Aggregated context network for crowd counting
A1 - Si-yue Yu
A1 - Jian Pu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 11
SP - 1626
EP - 1638
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900481
Abstract: crowd counting has been applied to a variety of applications such as video surveillance, traffic monitoring, assembly control, and other public safety applications. Context information, such as perspective distortion and background interference, is a crucial factor in achieving high performance for crowd counting. While traditional methods focus merely on solving one specific factor, we aggregate sufficient context information into the crowd counting network to tackle these problems simultaneously in this study. We build a fully convolutional network with two tasks, i.e., main density map estimation and auxiliary semantic segmentation. The main task is to extract the multi-scale and spatial context information to learn the density map. The auxiliary semantic segmentation task gives a comprehensive view of the background and foreground information, and the extracted information is finally incorporated into the main task by late fusion. We demonstrate that our network has better accuracy of estimation and higher robustness on three challenging datasets compared with state-of-the-art methods.
[1]Arteta C, Lempitsky V, Noble JA, et al., 2014. Interactive object counting. European Conf on Computer Vision, p.504-518.
[2]Boominathan L, Kruthiventi SSS, Babu RV, 2016. CrowdNet: a deep convolutional network for dense crowd counting. ACM Int Conf on Multimedia, p.640-644.
[3]Cao XK, Wang ZP, Zhao YY, et al., 2018. Scale aggregation network for accurate and efficient crowd counting. European Conf on Computer Vision, p.757-773.
[4]Chan AB, Vasconcelos N, 2012. Counting people with low-level features and Bayesian regression. IEEE Trans Image Process, 21(4):2160-2177.
[5]Chan AB, Liang ZSJ, Vasconcelos N, 2008. Privacy preserving crowd monitoring: counting people without people models or tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.1-7.
[6]Chen K, Loy CC, Gong SG, et al., 2012. Feature mining for localised crowd counting. British Machine Vision Conf, Article 21.
[7]Chen LC, Papandreou G, Schroff F, et al., 2017. Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587
[8]Chen LC, Papandreou G, Kokkinos I, et al., 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Patt Anal Mach Intell, 40(4):834-848.
[9]Cheng J, Wang PS, Li G, et al., 2018. Recent advances in efficient computation of deep convolutional neural networks. Front Inform Technol Electron Eng, 19(1):64-77.
[10]Cong RM, Lei JJ, Fu HZ, et al., 2018. Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Trans Image Process, 27(2):568-579.
[11]Cong RM, Lei JJ, Fu HZ, et al., 2019a. Going from RGB to RGBD saliency: a depth-guided transformation model. IEEE Trans Cybern, in press.
[12]Cong RM, Lei JJ, Fu HZ, et al., 2019b. Review of visual saliency detection with comprehensive information. IEEE Trans Circ Syst Video Technol, 29(10):2941-2959.
[13]Cong RM, Lei JJ, Fu HZ, et al., 2019c. Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans Image Process, 28(10):4819-4831.
[14]Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Conf on Computer Vision and Pattern Recognition, p.886-893.
[15]Deb D, Ventura J, 2018. An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.195-204.
[16]Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743-761.
[17]Fiaschi L, Nair R, Koethe U, et al., 2012. Learning to count with regression forest and structured labels. Int Conf on Pattern Recognition, p.2685-2688.
[18]Gao JY, Wang Q, Li XL, 2019. PCC Net: perspective crowd counting via spatial convolutional network. IEEE Trans Circ Syst Video Technol, in press.
[19]He XT, Peng YX, Zhao JJ, 2018. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394-1407.
[20]Huang JH, Di XG, Wu JD, et al., 2020. A novel convolutional neural network method for crowd counting. Front Inform Technol Electron Eng, 21(8).
[21]Huang SY, Li X, Zhang ZF, et al., 2018. Body structure aware deep crowd counting. IEEE Trans Image Process, 27:1049-1059.
[22]Idrees H, Saleemi I, Seibert C, et al., 2013. Multi-source multi-scale counting in extremely dense crowd images. IEEE Conf on Computer Vision and Pattern Recognition, p.2547-2554.
[23]Lempitsky V, Zisserman A, 2010. Learning to count objects in images. Conf and Workshop on Neural Information Processing Systems, p.1324-1332.
[24]Li CY, Cong RM, Hou JH, et al., 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sens, 57(11):9156-9166.
[25]Li M, Zhang ZX, Huang KQ, et al., 2008. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection. Int Conf on Pattern Recognition, p.1-4.
[26]Li YH, Zhang XFF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. IEEE Conf on Computer Vision and Pattern Recognition, p.1091-1100.
[27]Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation. IEEE Conf on Computer Vision and Pattern Recognition, p.3431-3440.
[28]Loy CC, Chen K, Gong SG, et al., 2013. Crowd Counting and Profiling: Methodology and Evaluation. Springer, New York, USA.
[29]Oñoro-Rubio D, López-Sastre RJ, 2016. Towards perspective-free object counting with deep learning. European Conf on Computer Vision, p.615-629.
[30]Paszke A, Gross S, Chintala S, et al., 2017. Automatic differentiation in PyTorch. 31st Conf on Neural Information Processing Systems, p.1-4.
[31]Peng YX, He XT, Zhao JJ, 2018. Object-part attention model for fine-grained image classification. IEEE Trans Image Process, 27(3):1487-1500.
[32]Pham VQ, Kozakaya T, Yamaguchi O, et al., 2015. COUNT forest: CO-voting uncertain number of targets using random forest for crowd density estimation. IEEE Int Conf on Computer Vision, p.3253-3261.
[33]Pu J, Jiang YG, Wang J, et al., 2014. Which looks like which: exploring inter-class relationships in fine-grained visual categorization. European Conf on Computer Vision, p.425-440.
[34]Rabaud V, Belongie S, 2006. Counting crowded moving objects. IEEE Conf on Computer Vision and Pattern Recognition, p.705-711.
[35]Rodriguez M, Laptev I, Sivic J, et al., 2011. Density-aware person detection and tracking in crowds. IEEE Int Conf on Computer Vision, p.2423-2430.
[36]Ruder S, 2017. An overview of multi-task learning in deep neural networks. https://arxiv.org/abs/1706.05098
[37]Ryan D, Denman S, Fookes CB, et al., 2010. Crowd counting using multiple local features. Proc Digital Image Computing: Techniques and Applications, p.81-88.
[38]Sam DB, Babu RV, 2018. Top-down feedback for crowd counting convolutional neural network. AAAI Conf on Artificial Intelligence, p.7323-7330.
[39]Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. IEEE Conf on Computer Vision and Pattern Recognition, p.4031-4039.
[40]Sam DB, Sajjan NN, Babu RV, 2018. Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. IEEE Conf on Computer Vision and Pattern Recognition, p.3618-3626.
[41]Shang C, Ai HZ, Bai B, 2016. End-to-end crowd counting via joint learning local and global count. IEEE Int Conf on Image Processing, p.1215-1219.
[42]Shen Z, Xu Y, Ni BB, et al., 2018. Crowd counting via adversarial cross-scale consistency pursuit. IEEE Conf on Computer Vision and Pattern Recognition, p.5245-5254.
[43]Shi MJ, Yang ZH, Xu C, et al., 2019. Revisiting perspective information for efficient crowd counting. IEEE Conf on Computer Vision and Pattern Recognition, p.7271-7280.
[44]Sindagi VA, Patel VM, 2017a. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. IEEE Int Conf on Advanced Video and Signal Based Surveillance, p.1-6.
[45]Sindagi VA, Patel VM, 2017b. Generating high-quality crowd density maps using contextual pyramid CNNs. IEEE Int Conf on Computer Vision, p.1879-1888.
[46]Sindagi VA, Patel VM, 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Patt Recogn Lett, 107:3-16.
[47]Viola P, Jones MJ, 2004. Robust real-time face detection. Int J Comput Vis, 57(2):137-154.
[48]Walach E, Wolf L, 2016. Learning to count with CNN boosting. European Conf on Computer Vision, p.660-676.
[49]Wang C, Zhang H, Yang L, et al., 2015. Deep people counting in extremely dense crowds. ACM Int Conf on Multimedia, p.1299-1302.
[50]Wang LY, Yin BQ, Guo AX, et al., 2018. Skip-connection convolutional neural network for still image crowd counting. Appl Intell, 48:3360-3371.
[51]Wang LY, Yin BQ, Tang X, et al., 2019. Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing, 332:360-371.
[52]Xie WX, Peng YX, Xiao JG, 2014. Weakly-supervised image parsing via constructing semantic graphs and hypergraphs. Proc 22nd ACM Int Conf on Multimedia, p.277-286.
[53]Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.833-841.
[54]Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. IEEE Conf on Computer Vision and Pattern Recognition, p.589-597.
[55]Zhu C, Peng YX, 2016. Group cost-sensitive boosting for multi-resolution pedestrian detection. 30th AAAI Conf on Artificial Intelligence, p.3676-3682.
Open peer comments: Debate/Discuss/Question/Opinion
<1>