Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2016 Vol.17 No.1 P.41-54

Extracting hand articulations from monocular depth images using curvature scale space descriptors

Author(s): Shao-fan Wang, Chun Li, De-hui Kong, Bao-cai Yin
Affiliation(s): 1. 1Beijing Key Laboratory of Multimedia and Intelligent Software Technology, College of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China more
Corresponding email(s): wangshaofan@bjut.edu.cn, kdh@bjut.edu.cn
Key Words: Curvature scale space (CSS), Hand articulation, Convex hull, Hand contour

Share this article to： More <<< Previous Article \|Next Article >>>

Shao-fan Wang, Chun Li, De-hui Kong, Bao-cai Yin. Extracting hand articulations from monocular depth images using curvature scale space descriptors[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(1): 41-54.

@article{title="Extracting hand articulations from monocular depth images using curvature scale space descriptors",
author="Shao-fan Wang, Chun Li, De-hui Kong, Bao-cai Yin",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="17",
number="1",
pages="41-54",
year="2016",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500126"
}

%0 Journal Article
%T Extracting hand articulations from monocular depth images using curvature scale space descriptors
%A Shao-fan Wang
%A Chun Li
%A De-hui Kong
%A Bao-cai Yin
%J Frontiers of Information Technology & Electronic Engineering
%V 17
%N 1
%P 41-54
%@ 2095-9184
%D 2016
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500126

TY - JOUR
T1 - Extracting hand articulations from monocular depth images using curvature scale space descriptors
A1 - Shao-fan Wang
A1 - Chun Li
A1 - De-hui Kong
A1 - Bao-cai Yin
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 17
IS - 1
SP - 41
EP - 54
%@ 2095-9184
Y1 - 2016
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500126

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

This paper proposed a framework of hand articulation detection from a monocular depth image using the curvature scale space (CSS) descriptors. The authors extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. This is the main contribution their work offers. They also recover undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Totally, this paper uses a practical method to solve the hand articulation detection problem using depth data only.

基于曲率尺度空间的单视深度图像手部特征提取

目的：从深度图像、彩色图像提取手部特征（如指尖、指根、手指关节、手形轮廓）是人机交互与虚拟现实领域的重要研究课题。由于人的手部运动自由度较多，受环境光照和噪声影响较大，以及手部出现自遮挡现象，手部特征提取的研究仍亟待解决。数据手套和微软Kinect体感设备的开发，一定程度上解决了手部特征提取的问题，但前者需用户穿戴设备，后者获取精度不高。本文提出一类基于曲率尺度空间特征描述符的手部特征点定位方法，实现从单视深度图像获取手部特征点的鲁棒算法。
创新点：提出改进的曲率尺度空间特征描述符，从手形轮廓提取手指的指尖点、指谷点；通过角度区域与手形轮廓及手部深度差异计算未检测的四指指尖；通过五个指根点以及手形轮廓的起始点构成的七边形计算未检测的大拇指指尖。
方法：通过openNI对单幅深度图像提取手部部分并提取手形轮廓点。将传统的曲率尺度空间特征描述符改进为适当阈值范围内的特征点提取算法，从手形轮廓提取手指的指尖点、指谷点；对未检测的指尖点通过角度阈值进行弯曲判断，通过角度区域与手形轮廓及手部深度差异逐一计算未检测的手部特征点。
结论：与传统的基于角度阈值、轮廓凸包等方法相比，改进的曲率尺度空间特征描述鲁棒性更佳，适合从手部轮廓中提取手部的指尖点和指谷点。在此基础上通过角度区域、手形轮廓及手部深度差等方法可逐一计算未检测的手部特征点。

关键词：曲率尺度空间；手部关节；凸包；手形轮廓

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abbasi, S., Mokhtarian, F., Kittler, J., 1999. Curvature scale space image in shape similarity retrieval. Multimedia Syst., 7(6):467-476.

[2]Athitsos, V., Sclaroff, S., 2002. An appearance-based framework for 3D hand shape classification and camera viewpoint estimation. Proc. 5th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.40-45.

[3]Athitsos, V., Sclaroff, S., 2003. Estimating 3D hand pose from a cluttered image. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.432-439.

[4]Cerezo, T., 2012. 3D hand and finger recognition using Kinect. Technical Report, Universidad de Granada, Spain. Available at http://frantracerkinectft.codeplex.com.

[5]Chang, W.Y., Chen, C.S., Jian, Y.D., 2008. Visual tracking in high-dimensional state space by appearance-guided particle filtering. IEEE Trans. Image Process., 17(7):1054-1067.

[6]de La Gorce, M., Fleet, D.J., Paragios, N., 2011. Model-based 3D hand pose estimation from monocular video. IEEE Trans. Patt. Anal. Mach. Intell., 33(9):1793-1805.

[7]Feng, Z., Yang, B., Chen, Y., et al., 2011. Features extraction from hand images based on new detection operators. Patt. Recog., 44(5):1089-1105.

[8]Keskin, C., Kıraç, F., Kara, Y.E., et al., 2011. Real time hand pose estimation using depth sensors. In: Fossati, A., Gall, J., Grabner, H., et al. (Eds.), Consumer Depth Cameras for Computer Vision, Springer, London, p.119-137.

[9]Kirac, F., Kara, Y.E., Akarun, L., 2014. Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Patt. Recog. Lett., 50:91-100.

[10]Lee, D., Lee, S., 2011. Vision-based finger action recognition by angle detection and contour analysis. ETRI J., 33(3):415-422.

[11]Ma, Z., Wu, E., 2014. Real-time and robust hand tracking with a single depth camera. Vis. Comput., 30(10):1133-1144.

[12]Maisto, M., Panella, M., Liparulo, L., et al., 2013. An accurate algorithm for the identification of fingertips using an RGB-D camera. IEEE J. Emerg. Sel. Topics Circ. Syst., 3(2):272-283.

[13]Morshidi, M., Tjahjadi, T., 2014. Gravity optimised particle filter for hand tracking. Patt. Recog., 47(1):194-207.

[14]Nagarajan, S., Subashini, T., Ramalingam, V., 2012. Vision based real time finger counter for hand gesture recognition. Int. J. Technol., 2(2):1-5.

[15]Oikonomidis, I., Kyriazis, N., Argyros, A.A., 2011. Efficient model-based 3D tracking of hand articulations using Kinect. BMVC, 1(2):1-11.

[16]Qian, C., Sun, X., Wei, Y., et al., 2014. Realtime and robust hand tracking from depth. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1106-1113.

[17]Ren, Z., Yuan, J., Zhang, Z., 2011. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. Proc. 19th ACM Int. Conf. on Multimedia, p.1093-1096.

[18]Rosales, R., Athitsos, V., Sigal, L., et al., 2001. 3D hand pose reconstruction using specialized mappings. Proc. 8th IEEE Int. Conf. on Computer Vision, p.378-385.

[19]Schlattmann, M., Kahlesz, F., Sarlette, R., et al., 2007. Markerless 4 gestures 6 DOF real-time visual tracking of the human hand with automatic initialization. Comput. Graph. Forum, 26(3):467-476.

[20]Tomasi, C., Petrov, S., Sastry, A., 2003. 3D tracking = classification + interpolation. Proc. 9th IEEE Int. Conf. on Computer Vision, p.1441-1448.

[21]Tompson, J., Stein, M., Lecun, Y., et al., 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph., 33(5):169.1-169.10.

Open peer comments: Debate/Discuss/Question/Opinion

<1>