CLC number: TP391.4
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2020-06-11
Cited: 0
Clicked: 5840
Citations: Bibtex RefMan EndNote GB/T7714
Jie-hao Huang, Xiao-guang Di, Jun-de Wu, Ai-yue Chen. A novel convolutional neural network method for crowd counting[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(8): 1150-1160.
@article{title="A novel convolutional neural network method for crowd counting",
author="Jie-hao Huang, Xiao-guang Di, Jun-de Wu, Ai-yue Chen",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="8",
pages="1150-1160",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900282"
}
%0 Journal Article
%T A novel convolutional neural network method for crowd counting
%A Jie-hao Huang
%A Xiao-guang Di
%A Jun-de Wu
%A Ai-yue Chen
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 8
%P 1150-1160
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900282
TY - JOUR
T1 - A novel convolutional neural network method for crowd counting
A1 - Jie-hao Huang
A1 - Xiao-guang Di
A1 - Jun-de Wu
A1 - Ai-yue Chen
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 8
SP - 1150
EP - 1160
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900282
Abstract: Crowd density estimation, in general, is a challenging task due to the large variation of head sizes in the crowds. Existing methods always use a multi-column convolutional neural network (MCNN) to adapt to this variation, which results in an average effect in areas with different densities and brings a lot of noise to the density map. To address this problem, we propose a new method called the segmentation-aware prior network (SAPNet), which generates a high-quality density map without noise based on a coarse head-segmentation map. SAPNet is composed of two networks, i.e., a foreground-segmentation convolutional neural network (FS-CNN) as the front end and a crowd-regression convolutional neural network (CR-CNN) as the back end. With only the single dot annotation, we generate the ground truth of segmentation masks in heads. Then, based on the ground truth, FS-CNN outputs a coarse head-segmentation map, which helps eliminate the noise in regions without people in the density map. By inputting the head-segmentation map generated by the front end, CR-CNN performs accurate crowd counting estimation and generates a high-quality density map. We demonstrate SAPNet on four datasets (i.e., ShanghaiTech, UCF-CC-50, WorldExpo’10, and UCSD), and show the state-of-the-art performances on ShanghaiTech part B and UCF-CC-50 datasets.
[1]Canny J, 1986. A computational approach to edge detection. IEEE Trans Patt Anal Mach Intell, 8(6):679-698.
[2]Chan AB, Vasconcelos N, 2009. Bayesian Poisson regression for crowd counting. Proc IEEE 12th Int Conf on Computer Vision, p.545-551.
[3]Chan AB, Liang ZSJ, Vasconcelos N, 2008. Privacy preserving crowd monitoring: counting people without people models or tracking. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1-7.
[4]Dai JF, Li Y, He KM, et al., 2016. R-FCN: object detection via region-based fully convolutional networks. Proc 30th Int Conf on Neural Information Processing Systems, p.379-387.
[5]Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743-761.
[6]Idrees H, Saleemi I, Seibert C, et al., 2013. Multi-source multi-scale counting in extremely dense crowd images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2547-2554.
[7]Kang K, Wang XG, 2014. Fully convolutional neural networks for crowd segmentation. https://arxiv.org/abs/1411.4464
[8]Lempitsky V, Zisserman A, 2010. Learning to count objects in images. Proc 23rd Int Conf on Neural Information Processing Systems, p.1324-1332.
[9]Li HH, He XJ, Wu HF, et al., 2018. Structured inhomogeneous density map learning for crowd counting. https://arxiv.org/abs/1801.06642
[10]Li JJ, Yang H, Wu S, 2016. Crowd semantic segmentation based on spatial-temporal dynamics. Proc 13th IEEE Int Conf on Advanced Video and Signal Based Surveillance, p.102-108.
[11]Li T, Chang H, Wang M, et al., 2015. Crowded scene analysis: a survey. IEEE Trans Circ Syst Video Technol, 25(3):367-386.
[12]Li YH, Zhang XF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1091-1100.
[13]Liu J, Gao CQ, Meng DY, et al., 2018. DecideNet: counting varying density crowds through attention guided detection and density estimation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5197-5206.
[14]Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3431-3440.
[15]Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4031-4039.
[16]Sam DB, Sajjan NN, Babu RV, 2018. Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3618-3626.
[17]Shen Z, Xu Y, Ni B, et al., 2018. Crowd counting via adversarial cross-scale consistency pursuit. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5245-5254.
[18]Sindagi VA, Patel VM, 2017. Generating high-quality crowd density maps using contextual pyramid CNNs. Proc IEEE Int Conf on Computer Vision, p.1879-1888.
[19]Sindagi VA, Patel VM, 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Patt Recogn Lett, 107:3-16.
[20]Zhan BB, Monekosso DN, Remagnino P, et al., 2008. Crowd analysis: a survey. Mach Vis Appl, 19(5-6):345-357.
[21]Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.833-841.
[22]Zhang C, Zhang K, Li HS, et al., 2016. Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans Multim, 18(6):1048-1061.
[23]Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.589-597.
Open peer comments: Debate/Discuss/Question/Opinion
<1>