
CLC number: TP391.4
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-01-25
Cited: 0
Clicked: 4522
Citations: Bibtex RefMan EndNote GB/T7714
Yue LU, Xingyu CHEN, Zhengxing WU, Junzhi YU, Li WEN. A novel robotic visual perception framework for underwater operation[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(11): 1602-1619.
@article{title="A novel robotic visual perception framework for underwater operation",
author="Yue LU, Xingyu CHEN, Zhengxing WU, Junzhi YU, Li WEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="11",
pages="1602-1619",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100366"
}
%0 Journal Article
%T A novel robotic visual perception framework for underwater operation
%A Yue LU
%A  Xingyu CHEN
%A  Zhengxing WU
%A  Junzhi YU
%A  Li WEN
%J Frontiers of Information Technology & Electronic Engineering 
%V 23
%N 11
%P 1602-1619
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100366
TY  - JOUR
T1 - A novel robotic visual perception framework for underwater operation
A1 - Yue LU
A1 -  Xingyu CHEN
A1 -  Zhengxing WU
A1 -  Junzhi YU
A1 -  Li WEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 11
SP - 1602
EP - 1619
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER - 
DOI - 10.1631/FITEE.2100366
Abstract: Underwater robotic operation usually requires visual perception (e.g., object detection and tracking), but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual perception. In addition, detection continuity and stability are important for robotic perception, but the commonly used static accuracy based evaluation (i.e., average precision) is insufficient to reflect detector performance across time. In response to these two problems, we present a design for a novel robotic visual perception framework. First, we generally investigate the relationship between a quality-diverse data domain and visual restoration in detection performance. As a result, although domain quality has an ignorable effect on within-domain detection accuracy, visual restoration is beneficial to detection in real sea scenarios by reducing the domain shift. Moreover, non-reference assessments are proposed for detection continuity and stability based on object tracklets. Further, online tracklet refinement is developed to improve the temporal performance of detectors. Finally, combined with visual restoration, an accurate and stable underwater robotic visual perception framework is established. Small-overlap suppression is proposed to extend video object detection (VID) methods to a single-object tracking task, leading to the flexibility to switch between detection and tracking. Extensive experiments were conducted on the ImageNet VID dataset and real-world robotic tasks to verify the correctness of our analysis and the superiority of our proposed approaches. The codes are available at https://github.com/yrqs/VisPerception.
[1]Bernardin K, Stiefelhagen R, 2008. Evaluating multiple object tracking performance: the clear MOT metrics. EURASIP J Image Video Process, 2008:246309.
[2]Bertasius G, Torresani L, Shi JB, 2018. Object detection in video with spatiotemporal sampling networks. Proc 15th European Conf on Computer Vision, p.342-357.
 
 [3]Cai MX, Wang Y, Wang S, et al., 2020. Grasping marine products with hybrid-driven underwater vehicle-manipulator system. IEEE Trans Autom Sci Eng, 17(3):1443-1454.
 
 [4]Chen XY, Yang XY, Kong SH, et al., 2019a. Dual refinement network for single-shot object detection. Proc Int Conf on Robotics and Automation, p.8305-8310.
 
 [5]Chen XY, Yu JZ, Kong SH, et al., 2019b. Towards real-time advancement of underwater visual quality with GAN. IEEE Trans Ind Electron, 66(12):9350-9359.
 
 [6]Chen XY, Yu JZ, Wu ZX, 2020. Temporally identity-aware SSD with attentional LSTM. IEEE Trans Cybern, 50(6):2674-2686.
 
 [7]Chen XY, Yu JZ, Kong SH, et al., 2021. Joint anchor-feature refinement for real-time accurate object detection in images and videos. IEEE Trans Circ Syst Video Technol, 31(2):594-607.
 
 [8]Chen YH, Li W, Sakaridis C, et al., 2018. Domain adaptive faster R-CNN for object detection in the wild. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3339-3348.
 
 [9]Chi C, Zhang SF, Xing JL, et al., 2019. Selective refinement network for high performance face detection. Proc AAAI Conf on Artificial Intelligence, p.8231-8238.
 
 [10]Everingham M, van Gool L, Williams CKI, et al., 2010. The PASCAL visual object classes (VOC) challenge. Int J Comput Vis, 88(2):303-338.
 
 [11]Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. Proc IEEE Int Conf on Computer Vision, p.3057-3065.
 
 [12]Gong ZY, Cheng JH, Chen XY, et al., 2018. A bio-inspired soft robotic arm: kinematic modeling and hydrodynamic experiments. J Bion Eng, 15(2):204-219.
 
 [13]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.
 
 [14]Howard AG, Zhu ML, Chen B, et al., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861
[15]Inoue N, Furuta R, Yamasaki T, et al., 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5001-5009.
 
 [16]Kalman RE, 1960. A new approach to linear filtering and prediction problems. J Bas Eng, 82(1):35-45.
 
 [17]Kalogeiton V, Ferrari V, Schmid C, 2016. Analysing domain shift factors between videos and images for object detection. IEEE Trans Patt Anal Mach Intell, 38(11):2327-2334.
 
 [18]Kang K, Li HS, Yan JJ, et al., 2018. T-CNN: tubelets with convolutional neural networks for object detection from videos. IEEE Trans Circ Syst Video Technol, 28(10):2896-2907.
 
 [19]Khodabandeh M, Vahdat A, Ranjbar M, et al., 2019. A robust learning approach to domain adaptive object detection. Proc IEEE/CVF Int Conf on Computer Vision, p.480-490.
 
 [20]Kim HU, Kim CS, 2016. CDT: cooperative detection and tracking for tracing multiple objects in video sequences. Proc 14th European Conf on Computer Vision, p.851-867.
 
 [21]Kristan M, Leonardis A, Matas J, et al., 2018. The sixth visual object tracking VOT2018 challenge results. Proc European Conf on Computer Vision, p.3-53.
 
 [22]Li B, Xu YX, Fan SS, et al., 2018. Underwater docking of an under-actuated autonomous underwater vehicle: system design and control implementation. Front Inform Technol Electron Eng, 19(8):1024-1041.
 
 [23]Li CY, Guo JC, Cong RM, et al., 2016. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans Image Process, 25(12):5664-5677.
 
 [24]Lin TY, Goyal P, Girshick R, et al., 2017. Focal loss for dense object detection. Proc IEEE Int Conf on Computer Vision, p.2999-3007.
 
 [25]Liu RS, Fan X, Zhu M, et al., 2020. Real-world underwater enhancement: challenges, benchmarks, and solutions under natural light. IEEE Trans Circ Syst Video Technol, 30(12):4861-4875.
 
 [26]Liu W, Anguelov D, Erhan D, et al., 2016. SSD: single shot multibox detector. Proc 14th European Conf on Computer Vision, p.21-37.
 
 [27]Lowe DG, 2004. Distinctive image features from scale-invariant keypoints. Int J Comput Vis, 60(2):91-110.
 
 [28]Luo H, Xie WX, Wang XG, et al., 2019. Detect or track: towards cost-effective video object detection/tracking. Proc AAAI Conf on Artificial Intelligence, p.8803-8810.
 
 [29]Panetta K, Gao C, Agaian S, 2016. Human-visual-system-inspired underwater image quality measures. IEEE J Ocean Eng, 41(3):541-551.
 
 [30]Raj A, Namboodiri VP, Tuytelaars T, 2015. Subspace alignment based domain adaptation for RCNN detector. Proc British Machine Vision Conf, p.166.1-166.11.
[31]Russakovsky O, Deng J, Su H, et al., 2015. ImageNet large scale visual recognition challenge. Int J Comput Vis, 115(3):211-252.
 
 [32]Schechner YY, Karpel N, 2004. Clear underwater vision. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.536-543.
 
 [33]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
[34]Xu JL, Ramos S, Vázquez D, et al., 2014. Domain adaptation of deformable part-based models. IEEE Trans Patt Anal Mach Intell, 36(12):2367-2380.
 
 [35]Yang M, Sowmya A, 2015. An underwater color image quality evaluation metric. IEEE Trans Image Process, 24(12):6062-6071.
 
 [36]Zhang SF, Wen LY, Bian X, et al., 2018. Single-shot refinement neural network for object detection. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4203-4212.
 
 [37]Zhou XY, Zhuo JC, Krähenbühl P, 2019. Bottom-up object detection by grouping extreme and center points. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.850-859.
 
 [38]Zhu DQ, Qu Y, Yang SX, 2019. Multi-AUV SOM task allocation algorithm considering initial orientation and ocean current environment. Front Inform Technol Electron Eng, 20(3):330-341.
 
 [39]Zhu YS, Zhao CY, Guo HY, et al., 2019. Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process, 28(1):113-126.
 
 
Open peer comments: Debate/Discuss/Question/Opinion
<1>