CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-12-26
Cited: 0
Clicked: 2038
Citations: Bibtex RefMan EndNote GB/T7714
Jiaqi GAO, Jingqi LI, Hongming SHAN, Yanyun QU, James Z. WANG, Fei-Yue WANG, Junping ZHANG. Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(2): 187-202.
@article{title="Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting",
author="Jiaqi GAO, Jingqi LI, Hongming SHAN, Yanyun QU, James Z. WANG, Fei-Yue WANG, Junping ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="2",
pages="187-202",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200380"
}
%0 Journal Article
%T Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting
%A Jiaqi GAO
%A Jingqi LI
%A Hongming SHAN
%A Yanyun QU
%A James Z. WANG
%A Fei-Yue WANG
%A Junping ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 2
%P 187-202
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200380
TY - JOUR
T1 - Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting
A1 - Jiaqi GAO
A1 - Jingqi LI
A1 - Hongming SHAN
A1 - Yanyun QU
A1 - James Z. WANG
A1 - Fei-Yue WANG
A1 - Junping ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 2
SP - 187
EP - 202
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200380
Abstract: crowd counting has important applications in public safety and pandemic control. A robust and practical crowd counting system has to be capable of continuously learning with the newly incoming domain data in real-world scenarios instead of fitting one domain only. Off-the-shelf methods have some drawbacks when handling multiple domains: (1) the models will achieve limited performance (even drop dramatically) among old domains after training images from new domains due to the discrepancies in intrinsic data distributions from various domains, which is called catastrophic forgetting; (2) the well-trained model in a specific domain achieves imperfect performance among other unseen domains because of domain shift; (3) it leads to linearly increasing storage overhead, either mixing all the data for training or simply training dozens of separate models for different domains when new ones are available. To overcome these issues, we investigate a new crowd counting task in incremental domain training setting called lifelong crowd counting. Its goal is to alleviate catastrophic forgetting and improve the generalization ability using a single model updated by the incremental domains. Specifically, we propose a self-distillation learning framework as a benchmark (forget less, count better, or FLCB) for lifelong crowd counting, which helps the model leverage previous meaningful knowledge in a sustainable manner for better crowd counting to mitigate the forgetting when new data arrive. A new quantitative metric, normalized Backward Transfer (nBwT), is developed to evaluate the forgetting degree of the model in the lifelong learning process. Extensive experimental results demonstrate the superiority of our proposed benchmark in achieving a low catastrophic forgetting degree and strong generalization ability.
[1]Bai S, He ZQ, Qiao Y, et al., 2020. Adaptive dilated network with self-correction supervision for counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4594-4603.
[2]Belouadah E, Popescu A, 2019. IL2M: class incremental learning with dual memory. Proc IEEE/CVF Int Conf on Computer Vision, p.583-592.
[3]Boominathan L, Kruthiventi SSS, Babu RV, 2016. CrowdNet: a deep convolutional network for dense crowd counting. Proc 24th ACM Int Conf on Multimedia, p.640-644.
[4]Cao XK, Wang ZP, Zhao YY, et al., 2018. Scale aggregation network for accurate and efficient crowd counting. Proc 15th European Conf on Computer Vision, p.734-750.
[5]Caron M, Misra I, Mairal J, et al., 2020. Unsupervised learning of visual features by contrasting cluster assignments. Proc 34th Int Conf on Neural Information Processing Systems, p.9912-9924.
[6]Chan AB, Vasconcelos N, 2009. Bayesian Poisson regression for crowd counting. Proc 12th IEEE Int Conf on Computer Vision, p.545-551.
[7]Chen BH, Yan ZY, Li K, et al., 2021. Variational attention: propagating domain-specific knowledge for multi-domain learning in crowd counting. Proc IEEE/CVF Int Conf on Computer Vision, p.16065-16075.
[8]Chen T, Kornblith S, Norouzi M, et al., 2020. A simple framework for contrastive learning of visual representations. Proc 37th Int Conf on Machine Learning, p.1597-1607.
[9]Chen XY, Bin YR, Sang N, et al., 2019. Scale pyramid network for crowd counting. Proc IEEE Winter Conf on Applications of Computer Vision, p.1941-1950.
[10]Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886-893.
[11]Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743-761.
[12]Grill JB, Strub F, Altché F, et al., 2020. Bootstrap your own latent a new approach to self-supervised learning. Proc 34th Int Conf on Neural Information Processing Systems, p.21271-21284.
[13]Guo D, Li K, Zha ZJ, et al., 2019. DADNet: dilated-attention-deformable ConvNet for crowd counting. Proc 27th ACM Int Conf on Multimedia, p.1823-1832.
[14]Han T, Gao JY, Yuan Y, et al., 2020. Focus on semantic consistency for cross-domain crowd understanding. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1848-1852.
[15]He KM, Fan HQ, Wu YX, et al., 2020. Momentum contrast for unsupervised visual representation learning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9729-9738.
[16]He YJ, Sick B, 2021. CLeaR: an adaptive continual learning framework for regression tasks. AI Persp, 3(1):2.
[17]Huang ZZ, Chen J, Zhang JP, et al., 2022. Learning representation for clustering via prototype scattering and positive sampling. IEEE Trans Patt Anal Mach Intell, early access.
[18]Idrees H, Tayyab M, Athrey K, et al., 2018. Composition loss for counting, density map estimation and localization in dense crowds. Proc 15th European Conf on Computer Vision, p.532-546.
[19]Jiang SQ, Lu XB, Lei YJ, et al., 2020. Mask-aware networks for crowd counting. IEEE Trans Circ Syst Video Technol, 30(9):3119-3129.
[20]Jiang XH, Zhang L, Xu ML, et al., 2020a. Attention scaling for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4706-4715.
[21]Jiang XH, Zhang L, Lv P, et al., 2020b. Learning multi-level density maps for crowd counting. IEEE Trans Neur Netw Learn Syst, 31(8):2705-2715.
[22]Kirkpatrick J, Pascanu R, Rabinowitz N, et al., 2017. Overcoming catastrophic forgetting in neural networks. PNAS, 114(13):3521-3526.
[23]Leibe B, Seemann E, Schiele B, 2005. Pedestrian detection in crowded scenes. Proc IEEE/CVF Computer Society Conf on Computer Vision and Pattern Recognition, p.878-885.
[24]Li YH, Zhang XF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1091-1100.
[25]Li ZZ, Hoiem D, 2018. Learning without forgetting. IEEE Trans Patt Anal Mach Intell, 40(12):2935-2947.
[26]Liu L, Lu H, Xiong HP, et al., 2020. Counting objects by blockwise classification. IEEE Trans Circ Syst Video Technol, 30(10):3513-3527.
[27]Liu LB, Qiu ZL, Li GB, et al., 2019. Crowd counting with deep structured scale integration network. Proc IEEE/CVF Int Conf on Computer Vision, p.1774-1783.
[28]Liu LB, Chen JQ, Wu HF, et al., 2021. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4823-4833.
[29]Liu N, Long YC, Zou CQ, et al., 2019. ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3225-3234.
[30]Liu WZ, Salzmann M, Fua P, 2019. Context-aware crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5099-5108.
[31]Liu WZ, Durasov N, Fua P, 2022. Leveraging self-supervision for cross-domain crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5341-5352.
[32]Lopez-Paz D, Ranzato M, 2017. Gradient episodic memory for continual learning. Proc 31st Int Conf on Neural Information Processing Systems, p.6467-6476.
[33]Lowe DG, 1999. Object recognition from local scale-invariant features. Proc 7th IEEE Int Conf on Computer Vision, p.1150-1157.
[34]Luo A, Yang F, Li X, et al., 2020. Hybrid graph neural networks for crowd counting. Proc 34th AAAI Conf on Artificial Intelligence, p.11693-11700.
[35]Ma ZH, Wei X, Hong XP, et al., 2019. Bayesian loss for crowd count estimation with point supervision. Proc IEEE/CVF Int Conf on Computer Vision, p.6142-6151.
[36]Ma ZH, Wei X, Hong XP, et al., 2020. Learning scales from points: a scale-aware probabilistic model for crowd counting. Proc 28th ACM Int Conf on Multimedia, p.220-228.
[37]Ma ZH, Hong XP, Wei X, et al., 2021. Towards a universal model for cross-dataset crowd counting. Proc IEEE/CVF Int Conf on Computer Vision, p.3205-3214.
[38]Niu C, Wang G, 2022a. Self-supervised representation learning with MUlti-Segmental Informational Coding (MUSIC). https://arxiv.org/abs/2206.06461
[39]Niu C, Wang G, 2022b. Unsupervised contrastive learning based transformer for lung nodule detection. Phys Med Biol, 67(20):204001.
[40]Niu C, Li MZ, Fan FL, et al., 2020. Suppression of correlated noise with similarity-based unsupervised deep learning. https://arxiv.org/abs/2011.03384
[41]Niu C, Shan HM, Wang G, 2022. SPICE: semantic pseudo-labeling for image clustering. IEEE Trans Image Process, 31:7264-7278.
[42]Rebuffi SA, Kolesnikov A, Sperl G, et al., 2017. iCaRL: incremental classifier and representation learning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2001-2010.
[43]Rusu AA, Rabinowitz NC, Desjardins G, et al., 2016. Progressive neural networks. https://arxiv.org/abs/1606.04671
[44]Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5744-5752.
[45]Shi ZL, Mettes P, Snoek C, 2019. Counting with focus for free. Proc IEEE/CVF Int Conf on Computer Vision, p.4200-4209.
[46]Sindagi VA, Patel VM, 2017. Generating high-quality crowd density maps using contextual pyramid CNNs. Proc IEEE Int Conf on Computer Vision, p.1861-1870.
[47]Sindagi VA, Patel VM, 2020. HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans Image Process, 29:323-335.
[48]Sindagi V, Yasarla R, Patel V, 2019. Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. Proc IEEE/CVF Int Conf on Computer Vision, p.1221-1231.
[49]Song QY, Wang CA, Wang YB, et al., 2021. To choose or to fuse? Scale selection for crowd counting. Proc 35th AAAI Conf on Artificial Intelligence, p.2576-2583.
[50]Tan X, Tao C, Ren TW, et al., 2019. Crowd counting via multi-layer regression. Proc 27th ACM Int Conf on Multimedia, p.1907-1915.
[51]Tian YK, Lei YM, Zhang JP, et al., 2020. PaDNet: pan-density crowd counting. IEEE Trans Image Process, 29:2714-2727.
[52]Tuzel O, Porikli F, Meer P, 2008. Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Patt Anal Mach Intell, 30(10):1713-1727.
[53]Wang BY, Liu HD, Samaras D, et al., 2020. Distribution matching for crowd counting. Proc 34th Int Conf on Neural Information Processing Systems, p.1595-1607.
[54]Wang C, Zhang H, Yang L, et al., 2015. Deep people counting in extremely dense crowds. Proc 23rd ACM Int Conf on Multimedia, p.1299-1302.
[55]Wang Q, Gao JY, Lin W, et al., 2019. Learning from synthetic data for crowd counting in the wild. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8198-8207.
[56]Wang Q, Gao JY, Lin W, et al., 2021. NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Patt Anal Mach Intell, 43(6):2141-2149.
[57]Wang Q, Han T, Gao JY, et al., 2022. Neuron linear transformation: modeling the domain shift for crowd counting. IEEE Trans Neur Netw Learn Syst, 33(8):3238-3250.
[58]Wu QQ, Wan J, Chan AB, 2021. Dynamic momentum adaptation for zero-shot cross-domain crowd counting. Proc 29th ACM Int Conf on Multimedia, p.658-666.
[59]Xiong HP, Lu H, Liu CX, et al., 2019. From open set to closed set: counting objects by spatial divide-and-conquer. Proc IEEE/CVF Int Conf on Computer Vision, p.8362-8371.
[60]Yan ZY, Li PY, Wang B, et al., 2021. Towards learning multi-domain crowd counting. IEEE Trans Circ Syst Video Technol, early access.
[61]Yang YF, Li GR, Wu Z, et al., 2020. Reverse perspective network for perspective-aware object counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4374-4383.
[62]Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.833-841.
[63]Zhang Q, Lin W, Chan AB, 2021. Cross-view cross-scene multi-view crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.557-567.
[64]Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.589-597.
[65]Zhao MM, Zhang CY, Zhang J, et al., 2020. Scale-aware crowd counting via depth-embedded convolutional neural networks. IEEE Trans Circ Syst Video Technol, 30(10):3651-3662.
[66]Zhu JY, Park T, Isola P, et al., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proc IEEE Int Conf on Computer Vision, p.2223-2232.
[67]Zhu L, Zhao ZJ, Lu C, et al., 2019. Dual path multi-scale fusion networks with attention for crowd counting. https://arxiv.org/abs/1902.01115
[68]Zou ZK, Qu XY, Zhou P, et al., 2021. Coarse to fine: domain adaptive crowd counting via adversarial scoring network. Proc 29th ACM Int Conf on Multimedia, p.2185-2194.
Open peer comments: Debate/Discuss/Question/Opinion
<1>