JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2015 Vol.16 No.11 P.917-929

View-invariant human action recognition via robust locally adaptive multi-view learning

Author(s): Jia-geng Feng, Jun Xiao
Affiliation(s): Institute of Artificial Intelligence, College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): fengjiageng@126.com
Key Words: View-invariant, Action recognition, Multi-view learning, L1-norm, Local learning

Share this article to： More <<< Previous Article \|Next Article >>>

Jia-geng Feng, Jun Xiao. View-invariant human action recognition via robust locally adaptive multi-view learning[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(11): 917-929.

@article{title="View-invariant human action recognition via robust locally adaptive multi-view learning",
author="Jia-geng Feng, Jun Xiao",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="11",
pages="917-929",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500080"
}

%0 Journal Article
%T View-invariant human action recognition via robust locally adaptive multi-view learning
%A Jia-geng Feng
%A Jun Xiao
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 11
%P 917-929
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500080

TY - JOUR
T1 - View-invariant human action recognition via robust locally adaptive multi-view learning
A1 - Jia-geng Feng
A1 - Jun Xiao
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 11
SP - 917
EP - 929
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500080

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Human action recognition is currently one of the most active research areas in computer vision. It has been widely used in many applications, such as intelligent surveillance, perceptual interface, and content-based video retrieval. However, some extrinsic factors are barriers for the development of action recognition; e.g., human actions may be observed from arbitrary camera viewpoints in realistic scene. Thus, view-invariant analysis becomes important for action recognition algorithms, and a number of researchers have paid much attention to this issue. In this paper, we present a multi-view learning approach to recognize human actions from different views. As most existing multi-view learning algorithms often suffer from the problem of lacking data adaptiveness in the nearest neighborhood graph construction procedure, a robust locally adaptive multi-view learning algorithm based on learning multiple local L1-graphs is proposed. Moreover, an efficient iterative optimization method is proposed to solve the proposed objective function. Experiments on three public view-invariant action recognition datasets, i.e., ViHASi, IXMAS, and WVU, demonstrate data adaptiveness, effectiveness, and efficiency of our algorithm. More importantly, when the feature dimension is correctly selected (i.e., >60), the proposed algorithm stably outperforms state-of-the-art counterparts and obtains about 6% improvement in recognition accuracy on the three datasets.

This paper proposes a multi-view learning method to recognize human actions from different views. The basic motivation of the proposed method is to adaptively construct the multiple local L1-graphs. The proposed method is technically sound in general and the experimental results indicate that the proposed method is effective w.r.t. the compared baseline methods.

基于鲁棒局部自适应多视角学习的视点无关人体行为识别

目的：基于视觉的人体行为识别是一个非常活跃的研究领域。它在智能监控、感知接口和基于内容的视频检索等领域具有广泛的应用前景。然而，一些现实应用场景仍然阻碍行为识别技术的发展，比如现实场景中的动作往往是从任意角度拍摄的。因此与视点无关的行为识别显得十分重要。大量研究者开始致力于行为识别的视点无关性。本文提出一种基于多视角学习的视点无关人体行为识别方法。
创新点：针对现有多视角学习算法在构建近邻图时缺乏数据自适应性的问题，本文提出一种自适应多视角学习算法。此外，还提出一种迭代优化求解方法对所构建的目标函数进行优化求解。
方法：对于单个视角下的所有样本特征数据，构建一个该视角下的L1图。在获得数据的稀疏图结构后，对于单视角下的数据，希望学习一种最优的降维方法，在对原始数据进行降维的同时，最大程度地保持数据内在的局部结构信息；对于不同的视角，取一个非负的权重向量来衡量不同视角的重要程度。对于全部的视角可以统一起来得到目标函数。最后利用迭代优化求解，用支持向量机（SVM）分类。
结论：将本文所提算法应用到视点无关的行为识别中，实验结果表明：该算法能够自适应地选择近邻数与不同特征的权重；与其他几种对比算法相比，本文所提算法的分类准确率更高。

关键词：视点无关；行为识别；多视角学习：L1范数

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Ashraf, A.B., Lucey, S., Chen, T., 2008. Learning patch correspondences for improved viewpoint invariant face recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1-8.

[2]Balakrishnama, S., Ganapathiraju, A., 1998. Linear Discriminant Analysis—a Brief Tutorial. Institute for Signal and Information Processing, Mississippi State University, USA.

[3]Balasubramanian, M., Schwartz, E.L., 2002. The isomap algorithm and topological stability. Science, 295(5552):7.

[4]Blum, A., Mitchell, T., 1998. Combining labeled and unlabeled data with co-training. Proc. 11th Annual Conf. on Computational Learning Theory, p.92-100.

[5]Bobick, A.F., Davis, J.W., 2001. The recognition of human movement using temporal templates. IEEE Trans. Patt. Anal. Mach. Intell., 23(3):257-267.

[6]Brémond, F., Thonnat, M., Zúñiga, M., 2006. Video-understanding framework for automatic behavior recognition. Behav. Res. Methods, 38(3):416-426.

[7]Candès, E., Romberg, J., 2005. l₁-Magic: Recovery of Sparse Signals via Convex Programming.

[8]Chen, C., Zhuang, Y.T., Xiao, J., 2010. Silhouette representation and matching for 3D pose discrimination—a comparative study. Image Vis. Comput., 28(4):654-667.

[9]Chen, H.S., Chen, H.T., Chen, Y., et al., 2006. Human action recognition using star skeleton. Proc. 4th ACM Int. Workshop on Video Surveillance and Sensor Networks, p.171-178.

[10]Cheng, B., Yang, J., Yan, S., et al., 2010. Learning with l¹-graph for image analysis. IEEE Trans. Image Process., 19(4):858-866.

[11]de Sa Virginia, R., 2005. Spectral clustering with two views. Proc. 22nd Annual Int. Conf. on Machine Learning, p.20-27.

[12]Donoho, D.L., 2006. For most large underdetermined systems of linear equations the minimal l₁-norm solution is also the sparsest solution. Commun. Pure Appl. Math., 59(6):797-829.

[13]Donoho, D.L., Elad, M., Temlyakov, V.N., 2006. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory, 52(1):6-18.

[14]Feng, J.G., Xiao, J., 2013. View-invariant action recognition: a survey. J. Image Graph., 18(2):157-168 (in Chinese).

[15]Fu, Y., Xian, Y.M., 2001. Image classification based on multi-feature and improved SVM ensemble. Comput. Eng., 37(21):196-198.

[16]He, X.F., Cai, D., Yan, S., et al., 2005. Neighborhood preserving embedding. Proc. 10th IEEE Int. Conf. on Computer Vision, p.1208-1213.

[17]Jean, F., Bergevin, R., Albu, A.B., 2008. Trajectories normalization for viewpoint invariant gait recognition. Proc. 19th Int. Conf. on Pattern Recognition, p.1-4.

[18]Junejo, I.N., Dexter, E., Laptev, I., et al., 2008. Cross-view action recognition from temporal self-similarities. Proc. 10th European Conf. on Computer Vision, p.293-306.

[19]Lee, D.D., Seung, H.S., 1999. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.

[20]Lewandowski, M., Martinez-del-Rincon, J., Makris, D., et al., 2010. Temporal extension of Laplacian eigenmaps for unsupervised dimensionality reduction of time series. Proc. 20th Int. Conf. on Pattern Recognition, p.161-164.

[21]Long, B., Yu, P.S., Zhang, Z.F., 2008. A general model for multiple view unsupervised learning. SIAM, p.822-833.

[22]Luo, Y., Wu, T., Hwang, J., 2003. Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks. Comput. Vis. Image Understand., 92(2-3):196-216.

[23]Mao, J.L., 2013. Adaptive multi-view learning and its application to image classification. J. Comput. Appl., 33(7):1955-1959 (in Chinese).

[24]Natarajan, P., Nevatia, R., 2008. View and scale invariant action recognition using multiview shape-flow models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1-8.

[25]Natarajan, P., Singh, V.K., Nevatia, R., 2010. Learning 3D action models from a few 2D videos for view invariant action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2006-2013.

[26]Parameswaran, V., Chellappa, R., 2006. View invariance for human action recognition. Int. J. Comput. Vis., 66(1):83-101.

[27]Rao, C., Yilmaz, A., Shah, M., 2002. View-invariant representation and recognition of actions. Int. J. Comput. Vis., 50(2):203-226.

[28]Raytchev, B., Kikutsugi, Y., Tamaki, T., et al., 2010. Class-speciﬁc low-dimensional representation of local features for viewpoint invariant object recognition. Proc. 10th Asian Conf. on Computer Vision, p.250-261.

[29]Roh, M., Shin, H., Lee, S., 2010. View-independent human action recognition with volume motion template on single stereo camera. Patt. Recogn. Lett., 31(7):639-647.

[30]Roweis, S.T., Saul, L.K., 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326.

[31]Shen, B., Si, L., 2010. Nonnegative matrix factorization clustering on multiple manifolds. Proc. 24th AAAI Conf. on Artificial Intelligence, p.575-580.

[32]Srestasathiern, P., Yilmaz, A., 2008. View invariant object recognition. Proc. 19th Int. Conf. on Pattern Recognition, p.1-4.

[33]Syeda-Mahmood, T., Vasilescu, A., Sethi, S., 2001. Recognizing action events from multiple viewpoints. Proc. IEEE Workshop on Detection and Recognition of Events in Video, p.64-72.

[34]Tang, Y.F., Huang, Z.M., Huang, R.J., et al., 2011. Texture image classification based on multi-feature extraction and SVM classifier. Comput. Appl. Softw., 28(6):22-46 (in Chinese).

[35]Tian, C., Fan, G., Gao, X., 2008. Multi-view face recognition by nonlinear tensor decomposition. Proc. 19th Int. Conf. on Pattern Recognition, p.1-4.

[36]Wang, Y., Huang, K., Tan, T., 2007. Multi-view gymnastic activity recognition with fused HMM. Proc. 8th Asian Conf. on Computer Vision, p.667-677.

[37]Weinland, D., Ronfard, R., Boyer, E., 2006. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Understand., 104(2-3):249-257.

[38]Weinland, D., Boyer, E., Ronfard, R., 2007. Action recognition from arbitrary views using 3D exemplars. Proc. IEEE 11th Int. Conf. on Computer Vision, p.1-7.

[39]Wen, J.H., Tian, Z., Lin, W., et al., 2011. Feature extraction based on supervised locally linear embedding for classification of hyperspectral images. J. Comput. Appl., 31(3):715-717.

[40]Wold, S., Esbensen, K., Geladi, P., 1987. Principal component analysis. Chemometr. Intell. Lab. Syst., 2(1-3):37-52.

[41]Wright, J., Yang, A.Y., Ganesh, A., et al., 2009. Robust face recognition via sparse representation. IEEE Trans. Patt. Anal. Mach. Intell., 31(2):210-227.

[42]Xia, T., Tao, D.C., Mei, T., et al., 2010. Multiview spectral embedding. IEEE Trans. Syst. Man Cybern., 40(6):1438-1446.

[43]Yan, P., Khan, S.M., Shah, M., 2008. Learning 4D action feature models for arbitrary view action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1-7.

[44]Yang, J., Jiang, Y.G., Hauptmann, A.G., et al., 2007. Evaluating bag-of-visual-words representations in scene classification. Proc. Int. Workshop on Multimedia Information Retrieval, p.197-206.

[45]Yilmaz, A., Shah, M., 2005. Actions as objects: a novel action representation. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.984-989.

[46]Yu, H., Sun, G., Song, W., et al., 2005. Human motion recognition based on neural network. Proc. Int. Conf. on Communications, Circuits and Systems, p.979-982.

[47]Zheng, S.E., Ye, S.Z., 2006. Semi-supervision and active relevance feedback algorithm for content-based image retrieval. Comput. Eng. Appl., S1:81-87 (in Chinese).

[48]Zhou, D., Burges, C.J.C., 2007. Spectral clustering and transductive learning with multiple views. Proc. 24th Int. Conf. on Machine Learning, p.1159-1166.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

基于鲁棒局部自适应多视角学习的视点无关人体行为识别

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference