JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2021 Vol.22 No.9 P.1194-1206

http://doi.org/10.1631/FITEE.2000272

Associative affinity network learning for multi-object tracking

Author(s): Liang Ma, Qiaoyong Zhong, Yingying Zhang, Di Xie, Shiliang Pu
Affiliation(s): Hikvision Research Institute, Hangzhou 310000, China
Corresponding email(s): maliang6@hikvision.com, zhongqiaoyong@hikvision.com, zhangyingying7@hikvision.com, xiedi@hikvision.com, pushiliang.hri@hikvision.com
Key Words: Multi-object tracking, Deep neural network, Affinity learning

Share this article to： More <<< Previous Article \|Next Article >>>

Liang Ma, Qiaoyong Zhong, Yingying Zhang, Di Xie, Shiliang Pu. Associative affinity network learning for multi-object tracking[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1194-1206.

@article{title="Associative affinity network learning for multi-object tracking",
author="Liang Ma, Qiaoyong Zhong, Yingying Zhang, Di Xie, Shiliang Pu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="9",
pages="1194-1206",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000272"
}

%0 Journal Article
%T Associative affinity network learning for multi-object tracking
%A Liang Ma
%A Qiaoyong Zhong
%A Yingying Zhang
%A Di Xie
%A Shiliang Pu
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 9
%P 1194-1206
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000272

TY - JOUR
T1 - Associative affinity network learning for multi-object tracking
A1 - Liang Ma
A1 - Qiaoyong Zhong
A1 - Yingying Zhang
A1 - Di Xie
A1 - Shiliang Pu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 9
SP - 1194
EP - 1206
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000272

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: We propose a joint feature and metric learning deep neural network architecture, called the associative affinity network (AAN), as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections, the AAN jointly learns bounding box regression, classification, and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss, we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT, and can also perform single-object tracking. Based on the AAN, we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.

面向多目标跟踪的关联相似度神经网络学习

马良，钟巧勇，张营营，谢迪，浦世亮
杭州海康威视数字技术股份有限公司，中国杭州市，310000
摘要：为解决视频多目标跟踪问题，提出一种特征和度量联合学习的深度神经网络架构，称为关联相似度网络。关联相似度网络以端到端的方式学习跟踪轨迹和检测结果之间的关联相似度。针对有缺陷的检测结果，关联相似度网络同时学习矩形框回归、目标分类和相似度回归3个任务。不同于现有基于对比排序思想的方法，我们直接训练一个二分类器来学习跟踪轨迹与检测结果的关联相似度，同时设计了损失函数来约束匹配集合元素的个数。得益于上述设计，关联相似度网络不仅能够解决多目标跟踪问题中的匹配问题，还可以进行单目标跟踪。基于提出的关联相似度网络，设计了一个简单的多目标跟踪算法，在MOT16和MOT17测试集上的实验结果表明其有效性。

关键词：多目标跟踪；深度神经网络；相似度学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Andriyenko A, Roth S, Schindler K, 2011. An analytical formulation of global occlusion reasoning for multi-target tracking. IEEE Int Conf on Computer Vision Workshops, p.1839-1846.

[2]Bergmann P, Meinhardt T, Leal-Taixé L, 2019a. Tracking without bells and whistles. IEEE/CVF Int Conf on Computer Vision, p.941-951.

[3]Bergmann P, Meinhardt T, Leal-Taixé L, 2019b. Tracktor++_v2. Available from https://github.com/phil-bergmann/tracking_wo_bnw [Accessed on July 9, 2020].

[4]Bullinger S, Bodensteiner C, Arens M, 2017. Instance flow based online multiple object tracking. IEEE Int Conf on Image Processing, p.785-789.

[5]Chen L, Ai HZ, Zhuang ZJ, et al., 2018. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE Int Conf on Multimedia and Expo, p.1-6.

[6]Chen S, Gong C, Yang J, et al., 2018. Adversarial metric learning. Proc 27^th Int Joint Conf on Artificial Intelligence, p.2021-2027.

[7]Chen S, Luo L, Yang J, et al., 2019. Curvilinear distance metric learning. Proc 33^rd Int Conf on Neural Information Processing Systems, p.4223-4232.

[8]Choi W, 2015. Near-online multi-target tracking with aggregated local flow descriptor. IEEE Int Conf on Computer Vision, p.3029-3037.

[9]Chu P, Ling HB, 2019. FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. IEEE/CVF Int Conf on Computer Vision, p.6171-6180.

[10]Chu Q, Ouyang WL, Li HS, et al., 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. Proc IEEE Int Conf on Computer Vision, p.4846-4855.

[11]Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886-893.

[12]Duan YQ, Lu JW, Zheng WH, et al., 2020. Deep adversarial metric learning. IEEE Trans Image Process, 29:2037-2051.

[13]Emami P, Ranka S, 2018. Learning permutations with sinkhorn policy gradient. https://arxiv.org/abs/1805.07010

[14]Fagot-Bouquet L, Audigier R, Dhome Y, et al., 2016. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. Proc 14^th European Conf on Computer Vision, p.774-790.

[15]Fang K, Xiang Y, Li XC, et al., 2018. Recurrent autoregressive networks for online multi-object tracking. IEEE Winter Conf on Applications of Computer Vision, p.466-475.

[16]Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. IEEE Int Conf on Computer Vision, p.3057-3065.

[17]Felzenszwalb PF, Girshick RB, McAllester D, et al., 2010. Object detection with discriminatively trained part-based models. IEEE Trans Patt Anal Mach Intell, 32(9):1627-1645.

[18]Han XF, Leung T, Jia YG, et al., 2015. MatchNet: unifying feature and metric learning for patch-based matching. IEEE Conf on Computer Vision and Pattern Recognition, p.3279-3286.

[19]He KM, Gkioxari G, Dollăr P, et al., 2017. Mask R-CNN. IEEE Int Conf on Computer Vision, p.2980-2988.

[20]Henschel R, Leal-Taixé L, Cremers D, et al., 2018. Fusion of head and full-body detectors for multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1509-1518.

[21]Hermans A, Beyer L, Leibe B, 2017. In defense of the triplet loss for person re-identification. https://arxiv.org/abs/1703.07737

[22]Ilg E, Mayer N, Saikia T, et al., 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1647-1655.

[23]Keuper M, Tang SY, Yu ZJ, et al., 2016. A multi-cut formulation for joint segmentation and tracking of multiple objects. https://arxiv.org/abs/1607.06317

[24]Kim C, Li FX, Ciptadi A, et al., 2015. Multiple hypothesis tracking revisited. IEEE Int Conf on Computer Vision, p.4696-4704.

[25]Lan L, Tao DC, Gong C, et al., 2016. Online multi-object tracking by quadratic pseudo-Boolean optimization. Proc 25^th Int Joint Conf on Artificial Intelligence, p.3396-3402.

[26]Leal-Taixé L, Canton-Ferrer C, Schindler K, 2016. Learning by tracking: Siamese CNN for robust target association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.418-425.

[27]Ma C, Yang CS, Yang F, et al., 2018. Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking. IEEE Int Conf on Multimedia and Expo, p.1-6.

[28]Maksai A, Wang XC, Fleuret F, et al., 2017. Non-Markovian globally consistent multi-object tracking. IEEE Int Conf on Computer Vision, p.2563-2573.

[29]Milan A, Rezatofighi SH, Garg R, et al., 2017a. Data-driven approximations to NP-hard problems. Proc 31^st AAAI Conf on Artificial Intelligence, p.1453-1459.

[30]Milan A, Rezatofighi SH, Dick A, et al., 2017b. Online multi-target tracking using recurrent neural networks. Proc 31^st AAAI Conf on Artificial Intelligence, p.4225-4232.

[31]Nummiaro K, Koller-Meier E, van Gool L, 2003. An adaptive color-based particle filter. Image Vis Comput, 21(1):99-110.

[32]Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137-1149.

[33]Rezatofighi SH, Milan A, Zhang Z, et al., 2015. Joint probabilistic data association revisited. IEEE Int Conf on Computer Vision, p.3047-3055.

[34]Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6036-6046.

[35]Ristani E, Solera F, Zou R, et al., 2016. Performance measures and a data set for multi-target, multi-camera tracking. European Conf on Computer Vision, p.17-35.

[36]Sadeghian A, Alahi A, Savarese S, 2017. Tracking the untrackable: learning to track multiple cues with long-term dependencies. IEEE Int Conf on Computer Vision, p.300-311.

[37]Schulter S, Vernaza P, Choi W, et al., 2017. Deep network flow for multi-object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2730-2739.

[38]Shen H, Huang LC, Huang C, et al., 2018. Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. https://arxiv.org/abs/1808.01562

[39]Shrivastava A, Gupta A, Girshick R, 2016. Training region-based object detectors with online hard example mining. IEEE Conf on Computer Vision and Pattern Recognition, p.761-769.

[40]Son J, Baek M, Cho M, et al., 2017. Multi-object tracking with quadruplet convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.3786-3795.

[41]Sun SJ, Akhtar N, Song HS, et al., 2021. Deep affinity network for multiple object tracking. IEEE Trans Patt Anal Mach Intell, 43(1):104-119.

[42]Tang SY, Andriluka M, Andres B, et al., 2017. Multiple people tracking by lifted multicut and person re-identification. IEEE Conf on Computer Vision and Pattern Recognition, p.3701-3710.

[43]Wang B, Wang L, Shuai B, et al., 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.386-393.

[44]Wang XY, Han TX, Yan S, 2009. An HOG-LBP human detector with partial occlusion handling. Proc IEEE 12^th Int Conf on Computer Vision, p.32-39.

[45]Wojke N, Bewley A, Paulus D, 2017. Simple online and realtime tracking with a deep association metric. IEEE Int Conf on Image Processing, p.3645-3649.

[46]Xiang J, Sang N, Hou JH, et al., 2016. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Process Lett, 23(2):257-261.

[47]Xiang J, Xu GH, Ma C, et al., 2021. End-to-end learning deep CRF models for multi-object tracking. IEEE Trans Circ Syst Video Technol, 31(1):275-288.

[48]Xiang Y, Alahi A, Savarese S, 2015. Learning to track: online multi-object tracking by decision making. IEEE Int Conf on Computer Vision, p.4705-4713.

[49]Yang B, Nevatia R, 2014. Multi-target tracking by online learning a CRF model of appearance and motion patterns. Int J Comput Vis, 107(2):203-217.

[50]Yang F, Choi W, Lin YQ, 2016. Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. IEEE Conf on Computer Vision and Pattern Recognition, p.2129-2137.

[51]Yin JB, Wang WG, Meng QH, et al., 2020. A unified object motion and affinity model for online multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6767-6776.

[52]Zhang JMY, Zhou SP, Chang X, et al., 2020. Multiple object tracking by flowing and fusing. https://arxiv.org/abs/2001.11180

[53]Zhou XY, Koltun V, Krähenbühl P, 2020. Tracking objects as points. https://arxiv.org/abs/2004.01177

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

面向多目标跟踪的关联相似度神经网络学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference