CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2015-10-16
Cited: 3
Clicked: 8185
Ying Cai, Meng-long Yang, Jun Li. Multiclass classification based on a deep convolutional network for head pose estimation[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(11): 930-939.
@article{title="Multiclass classification based on a deep convolutional network for head pose estimation",
author="Ying Cai, Meng-long Yang, Jun Li",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="11",
pages="930-939",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500125"
}
%0 Journal Article
%T Multiclass classification based on a deep convolutional network for head pose estimation
%A Ying Cai
%A Meng-long Yang
%A Jun Li
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 11
%P 930-939
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500125
TY - JOUR
T1 - Multiclass classification based on a deep convolutional network for head pose estimation
A1 - Ying Cai
A1 - Meng-long Yang
A1 - Jun Li
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 11
SP - 930
EP - 939
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500125
Abstract: head pose estimation has been considered an important and challenging task in computer vision. In this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN) for 2D face images. We design an effective and simple method to roughly crop the face from the input image, maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two convolutional neural networks are set up to train the head pose classifier and then compared with each other. The simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses and more training samples. Before training the network, two reasonable strategies including shift and zoom are executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of the classification component through training, to minimize the classification error. Our method has been evaluated on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art methods for head pose estimation.
This paper uses convolutional neural networks for pose estimation. This method is evaluated on the CAS-PEAL-R1 database, the CMU PIE database and the CUBIC FACEPIX database. The idea is simple, but seems working for the experimental results.
[1]Black, J.A.Jr., Gargesha, M., Kahol, K., et al., 2002. A framework for performance evaluation of face recognition algorithms. SPIE, 4862:163.
[2]Cireşan, D., Meier, U., Schmidhuber, J., 2012. Multi-column deep neural networks for image classification. CVPR, p.3642-3649.
[3]Farabet, C., Couprie, C., Najman, L., et al., 2013. Learning hierarchical features for scene labeling. IEEE Trans. Patt. Anal. Mach. Intell., 35(8):1915-1929.
[4]Fu, Y., Huang, T.S., 2006. Graph embedded analysis for head pose estimation. 7th Int. Conf. on Automatic Face and Gesture Recognition, p.1-6.
[5]Fukushima, K., 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern., 36(4):193-202.
[6]Gao, W., Cao, B., Shan, S.G., et al., 2008. The CAS-PEAL large-scale Chinese face database and baseline evaluations. IEEE Trans. Syst. Man. Cybern. A, 38(1):149-161.
[7]Huang, C., Ding, X.Q., Fang, C., 2010. Head pose estimation based on random forests for multiclass classification. ICPR, p.934-937.
[8]Jarrett, K., Kavukcuoglu, K., Ranzato, M., et al., 2009. What is the best multi-stage architecture for object recognition? ICCV, p.2146-2153.
[9]Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. ImageNet classification with deep convolutional neural networks. NIPS, p.1097-1105.
[10]Lanitis, A., Taylor, C.J., Cootes, T.F., et al., 1995. Automatic interpretation of human faces and hand gestures using flexible models. Int. Workshop on Automatic Face- and Gesture-Recognition, p.98-103.
[11]LeCun, Y., Bengio, Y., 1995. Convolutional networks for images, speech, and time series. In: Arbib, M.A., (Ed.), The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge.
[12]LeCun, Y., Jackel, L.D., Boser, B., et al., 1989. Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun. Mag., 27(11):41-46.
[13]LeCun, Y., Kanter, I., Solla, S.A., 1991. Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett., 66(18):2396.
[14]LeCun, Y., Bottou, L., Bengio, Y., et al., 1998. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278-2324.
[15]Luo, P., Wang, X.G., Tang, X.O., 2012. Hierarchical face parsing via deep learning. CVPR, p.2480-2487.
[16]Ma, B.P., Zhang, W.C., Shan, S.G., et al., 2006. Robust head pose estimation using LGBP. ICPR, p.512-515.
[17]Ma, B.P., Chai, X.J., Wang, T.J., 2013. A novel feature descriptor based on biologically inspired feature for head pose estimation. Neurocomputing, 115(4):1-10.
[18]Matsugu, M., Cardon, P., 2004. Unsupervised feature selection for multi-class object detection using convolutional neural networks. ISNN, p.864-869.
[19]Murphy-Chutorian, E., Trivedi, M.M., 2009. Head pose estimation in computer vision: a survey. IEEE Trans. Patt. Anal. Mach. Intell., 31(4):607-626.
[20]Raytchev, B., Yoda, I., Sakaue, K., 2004. Head pose estimation by nonlinear manifold learning. ICPR, p.462-466.
[21]Scherer, D., Müller, A., Behnke, S., 2010. Evaluation of pooling operations in convolutional architectures for object recognition. Proc. 20th Int. Conf. on Artificial Neural Networks, p.92-101.
[22]Sim, T., Baker, S., Bsat, M., 2002. The CMU pose, illumination, and expression (PIE) database. 5th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.46-51.
[23]Simard, P.Y., Steinkraus, D., Platt, J.C., 2003. Best practices for convolutional neural networks applied to visual document analysis. 7th Int. Conf. on Document Analysis and Recognition, p.958-963.
[24]Storer, M., Urschler, M., Bischof, H., 2009. 3D-MAM: 3D morphable appearance model for efficient fine head pose estimation from still images. ICCV, p.192-199.
[25]Sun, Y., Wang, X.G., Tang, X.O., 2013. Deep convolutional network cascade for facial point detection. CVPR, p.3476-3483.
[26]Tang, Y.Q., Sun, Z.N., Tan, T.N., 2014. A survey on head pose estimation. Patt. Recogn. Artif. Intell., 27(3):213-225 (in Chinese).
[27]Wang, J.G., Sung, E., 2007. EM enhancement of 3D head pose estimated by point at infinity. Image Vis. Comput., 25(12):1864-1874.
[28]Wang, X.W., Huang, X.Y., Gao, J.Z., et al., 2008. Illumination and person-insensitive head pose estimation using distance metric learning. ECCV, p.624-637.
Open peer comments: Debate/Discuss/Question/Opinion
<1>