JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C 2010 Vol.11 No.9 P.718-723

Modified reward function on abstract features in inverse reinforcement learning

Author(s): Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu
Affiliation(s): School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): charles_csy@zju.edu.cn
Key Words: Importance rating, Abstract feature, Feature extraction, Inverse reinforcement learning (IRL), Markov decision process (MDP)

Share this article to： More <<< Previous Article \|Next Article >>>

Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu. Modified reward function on abstract features in inverse reinforcement learning[J]. Journal of Zhejiang University Science C, 2010, 11(9): 718-723.

@article{title="Modified reward function on abstract features in inverse reinforcement learning",
author="Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu",
journal="Journal of Zhejiang University Science C",
volume="11",
number="9",
pages="718-723",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C0910486"
}

%0 Journal Article
%T Modified reward function on abstract features in inverse reinforcement learning
%A Shen-yi Chen
%A Hui Qian
%A Jia Fan
%A Zhuo-jun Jin
%A Miao-liang Zhu
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 9
%P 718-723
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C0910486

TY - JOUR
T1 - Modified reward function on abstract features in inverse reinforcement learning
A1 - Shen-yi Chen
A1 - Hui Qian
A1 - Jia Fan
A1 - Zhuo-jun Jin
A1 - Miao-liang Zhu
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 9
SP - 718
EP - 723
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C0910486

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: We improve inverse reinforcement learning (IRL) by applying dimension reduction methods to automatically extract abstract features from human-demonstrated policies, to deal with the cases where features are either unknown or numerous. The importance rating of each abstract feature is incorporated into the reward function. Simulation is performed on a task of driving in a five-lane highway, where the controlled car has the largest fixed speed among all the cars. Performance is almost 10.6% better on average with than without importance ratings.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abbeel, P., Ng, A.Y., 2004. Apprenticeship Learning via Inverse Reinforcement Learning. Proc. 21st Int. Conf. on Machine Learning, p.1-8.

[2]Abbeel, P., Ng, A.Y., 2005. Exploration and Apprenticeship Learning in Reinforcement Learning. Proc. 22nd Int. Conf. on Machine Learning, p.1-8.

[3]Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S., 2008. Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation. Proc. Int. Conf. on Intelligent Robots and Systems, p.1083-1090.

[4]Amit, R., Mataric, M., 2002. Learning Movement Sequences from Demonstration. Proc. 2nd Int. Conf. on Development and Learning, p.203-208.

[5]Atkeson, C., Schaal, S., 1997. Robot Learning from Demonstration. Proc. 14th Int. Conf. on Machine Learning, p.12-20.

[6]Coates, A., Abbeel, P., Ng, A.Y., 2009. Apprenticeship learning for helicopter control. Commun. ACM, 52(7):97-105.

[7]Hayes, G., Demiris, J., 1994. A Robot Controller Using Learning by Imitation. Proc. 2nd Int. Symp. on Intelligent Robotic Systems, p.198-204.

[8]Kolter, J.Z., Abbeel, P., Ng, A.Y., 2008a. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Advances in Neural Information Processing Systems. MIT Press, Cambridge, p.769-776.

[9]Kolter, J.Z., Rodgers, M.P., Ng, A.Y., 2008b. A Complete Control Architecture for Quadruped Locomotion over Rough Terrain. Proc. Int. Conf. on Robotics and Automation, p.811-818.

[10]Kuniyoshi, Y., Inaba, M., Inoue, H., 1994. Learning by watching: extracting reusable task knowledge from visual observation of human performance. IEEE Trans. Rob. Autom., 10(6):799-822.

[11]Mitchell, T., 1997. Machine Learning. McGraw Hill, New York, p.385-392.

[12]Ng, A.Y., Russell, S., 2000. Algorithms for Inverse Reinforcement Learning. Proc.17th Int. Conf. on Machine Learning, p.663-670.

[13]Ng, A.Y., Harada, D., Russell, S., 1999. Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. Proc. 16th Int. Conf. on Machine Learning, p.278-287.

[14]Pomerleau, D., 1989. Alvinn: an Autonomous Land Vehicle in a Neural Network. Advances in Neural Information Processing Systems 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, p.305-313.

[15]Puterman, M., 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY.

[16]Rebula, J.R., Neuhaus, P.D., Bonnlander, B.V., Johnson, M.J., Pratt, J.E., 2007. A Controller for the LittleDog Quadruped Walking on Rough Terrain. IEEE Int. Conf. on Robotics and Automation, p.1467-1473.

[17]Russell, S., 1998. Learning Agents for Uncertain Environments. Proc. 11th Annual Conf. on Computational Learning Theory, p.101-103.

[18]Sammut, C., Hurst, S., Kedzier, D., Michie, D., 1992. Learning to Fly. Proc. 9th Int. Workshop on Machine Learning, p.385-393.

[19]Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning. MIT Press, USA.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference