Full Text:   <2877>

CLC number: TP181

On-line Access: 2010-01-10

Received: 2010-01-09

Revision Accepted: 2010-04-06

Crosschecked: 2010-12-06

Cited: 0

Clicked: 7216

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE C 2011 Vol.12 No.1 P.17-24

http://doi.org/10.1631/jzus.C1010010


Convergence analysis of an incremental approach to online inverse reinforcement learning


Author(s):  Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu

Affiliation(s):  School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   jinzhuojun@zju.edu.cn, qianhui@zju.edu.cn

Key Words:  Incremental approach, Reward recovering, Online learning, Inverse reinforcement learning, Markov decision process


Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning[J]. Journal of Zhejiang University Science C, 2011, 12(1): 17-24.

@article{title="Convergence analysis of an incremental approach to online inverse reinforcement learning",
author="Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu",
journal="Journal of Zhejiang University Science C",
volume="12",
number="1",
pages="17-24",
year="2011",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1010010"
}

%0 Journal Article
%T Convergence analysis of an incremental approach to online inverse reinforcement learning
%A Zhuo-jun Jin
%A Hui Qian
%A Shen-yi Chen
%A Miao-liang Zhu
%J Journal of Zhejiang University SCIENCE C
%V 12
%N 1
%P 17-24
%@ 1869-1951
%D 2011
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1010010

TY - JOUR
T1 - Convergence analysis of an incremental approach to online inverse reinforcement learning
A1 - Zhuo-jun Jin
A1 - Hui Qian
A1 - Shen-yi Chen
A1 - Miao-liang Zhu
J0 - Journal of Zhejiang University Science C
VL - 12
IS - 1
SP - 17
EP - 24
%@ 1869-1951
Y1 - 2011
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1010010


Abstract: 
Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abbeel, P.Y., Ng, A.Y., 2004. Apprenticeship Learning via Inverse Reinforcement Learning. 21st Int. Conf. on Machine Learning, p.1-8.

[2]Abbeel, P.Y., Coates, A., Quigley, M.Y., Ng, A., 2007. An Application of Reinforcement Learning to Aerobatic Helicopter Flight. Advances in Neural Information Processing Systems. MIT Press, Cambridge, McCallum, p.76-84.

[3]Abbeel, P.Y., Dolgov, D., Ng, A.Y., Thrun, S., 2008. Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.1083-1090.

[4]Chen, S.Y., Qian, H., Fan, J., Jin, Z.J., Zhu, M.L., 2010. Modified reward function on abstract features in inverse reinforcement learning. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(9):718-723.

[5]Kivinen, J., 2003. Online learning of linear classifiers. Adv. Lect. Mach. Learn., 26(1):235-258.

[6]Kolter, J.Z., Abbeel, P.Y., Ng, A., 2008. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Advances in Neural Information Processing Systems. MIT Press, Cambridge, UK, p.769-776.

[7]Lopes, M., Melo, F., Montesano, L., 2009. Active learning for reward estimation in inverse reinforcement learning. LNCS, 5782:31-46.

[8]Neu, G., Szepesvari, C., 2007. Apprenticeship Learning Using Inverse Reinforcement Learning and Gradient Methods. 23rd Conf. on Uncertainty in Artificial Intelligence, p.295-302.

[9]Ng, A., Russell, S., 2000. Algorithms for Inverse Reinforcement Learning. 17th Int. Conf. on Machine Learning, p.663-670.

[10]Ramachandran, D., Amir, E., 2007. Bayesian Inverse Reinforcement Learning. 20th Int. Joint Conf. on Artifical Intelligence, p.2586-2591.

[11]Ratliff, D.N., Bagnell, J.A., Zinkevich, M., 2006. Maximum Margin Planning. 23rd Int. Conf. on Machine Learning, p.729-736.

[12]Ratliff, D.N., Bagnell, J.A., Srinivasa, S.S., 2007. Imitation Learning for Locomotion and Manipulation. 7th IEEE-RAS Int. Conf. on Humanoid Robots, p.392-397.

[13]Russell, S., 1998. Learning Agents for Uncertain Environments. 11th Annual Conf. on Computational Learning Theory, p.101-103.

[14]Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, USA, p.51-86.

[15]Syed, U., Schapire, R.E., 2008. A Game-Theoretic Approach to Apprenticeship Learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, p.1449-1456.

[16]Syed, U., Bowling, M., Schapire, R.E., 2008. Apprenticeship Learning Using Linear Programming. 25th Int. Conf. on Machine Learning, p.1032-1039.

[17]Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K., 2008. Maximum Entropy Inverse Reinforcement Learning. 23rd National Conf. on Artificial Intelligence, p.1433-1438.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE