CLC number: TP242.6
On-line Access: 2013-03-05
Received: 2012-07-19
Revision Accepted: 2012-10-12
Crosschecked: 2013-01-04
Cited: 8
Clicked: 8884
Xin Ma, Ya Xu, Guo-qiang Sun, Li-xia Deng, Yi-bin Li. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots[J]. Journal of Zhejiang University Science C, 2013, 14(3): 167-178.
@article{title="State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots",
author="Xin Ma, Ya Xu, Guo-qiang Sun, Li-xia Deng, Yi-bin Li",
journal="Journal of Zhejiang University Science C",
volume="14",
number="3",
pages="167-178",
year="2013",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1200226"
}
%0 Journal Article
%T State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots
%A Xin Ma
%A Ya Xu
%A Guo-qiang Sun
%A Li-xia Deng
%A Yi-bin Li
%J Journal of Zhejiang University SCIENCE C
%V 14
%N 3
%P 167-178
%@ 1869-1951
%D 2013
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1200226
TY - JOUR
T1 - State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots
A1 - Xin Ma
A1 - Ya Xu
A1 - Guo-qiang Sun
A1 - Li-xia Deng
A1 - Yi-bin Li
J0 - Journal of Zhejiang University Science C
VL - 14
IS - 3
SP - 167
EP - 178
%@ 1869-1951
Y1 - 2013
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1200226
Abstract: This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.
[1]Agirrebeitia, J., Aviles, R., de Bustos, I.F., Ajuria, G., 2005. A new APF strategy for path planning in environments with obstacles. Mech. Mach. Theory, 40(6):645-658.
[2]Alexopoulos, C., Griffin, P.M., 1992. Path planning for a mobile robot. IEEE Trans. Syst. Man Cybern., 22(2):318-322.
[3]Al-Taharwa, I., Sheta, A., Al-Weshah, M., 2008. A mobile robot path planning using genetic algorithm in static environment. J. Comput. Sci., 4(4):341-344.
[4]Barraquand, J., Langlois, B., Latombe, J.C., 1992. Numerical potential field techniques for robot path planning. IEEE Trans. Syst. Man Cybern., 22(2):224-241.
[5]Cao, Q., Huang, Y., Zhou, J., 2006. An Evolutionary Artificial Potential Field Algorithm for Dynamic Path Planning of Mobile Robot. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.3331-3336.
[6]Castillo, O., Trujillo, L., Melin, P., 2007. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Comput., 11(3):269-279.
[7]Dearden, R., Friedman, N., Russell, S., 1998. Bayesian Q-Learning. Proc. National Conf. on Artificial Intelligence, p.761-768.
[8]Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J., 2010. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res., 29(5):485-501.
[9]Framling, K., 2007. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 20(6):736-747.
[10]Garcia, M.A., Montiel, O., Castillo, O., Sepulveda, R., Melin, P., 2009. Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl. Soft Comput., 9(3):1102-1110.
[11]Ge, S.S., Cui, Y.J., 2002. Dynamic motion planning for mobile robots using potential field method. Auton. Robots, 13(3):207-222.
[12]Ghatee, M., Mohades, A., 2009. Motion planning in order to optimize the length and clearance applying a hopfield neural network. Expert Syst. Appl., 36(3):4688-4695.
[13]Gong, D., Lu, L., Li, M., 2009. Robot Path Planning in Uncertain Environments Based on Particle Swarm Optimization. Proc. IEEE Congress on Evolutionary Computation, p.2127-2134.
[14]Guo, M., Liu, Y., Malec, J., 2004. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern. B, 34(5):2140-2143.
[15]Hachour, O., 2009. The proposed fuzzy logic navigation approach of autonomous mobile robots in unknown environments. Int. J. Math. Models Methods Appl. Sci., 3(3):204-218.
[16]Hamagami, T., Hirata, H., 2003. An Adjustment Method of the Number of States of Q-Learning Segmenting State Space Adaptively. Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, 4:3062-3067.
[17]Hart, P.E., Nilsson, N.J., Raphael, B., 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern., 4(2):100-107.
[18]Hwang, H.J., Viet, H.H., Chung, T., 2011. Q(λ) based vector direction for path planning problem of autonomous mobile robots. Lect. Notes Electr. Eng., 107(Part 4):433-442.
[19]Jaradat, M.A.K., Al-Rousan, M., Quadan, L., 2011. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput.-Integr. Manuf., 27(1):135-149.
[20]Jin, Z., Liu, W., Jin, J., 2009. Partitioning the State Space by Critical States. Proc. 4th Int. Conf. on Bio-Inspired Computing, p.1-7.
[21]Kala, R., Shukla, A., Tiwari, R., 2010. Fusion of probabilistic A* algorithm and fuzzy inference system for robotic path planning. Artif. Intell. Rev., 33(4):307-327.
[22]Koenig, S., Simmons, R.G., 1996. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn., 22(1-3):227-250.
[23]Lampton, A., Valasek, J., 2009. Multiresolution State-Space Discretization Method for Q-Learning. Proc. American Control Conf., p.1646-1651.
[24]Latombe, J.C., 1991. Robot Motion Planning. Kluwer Academic Publishers.
[25]Oh, C.H., Nakashima, T., Ishibuchi, H., 1998. Initialization of Q-Values by Fuzzy Rules for Accelerating QLearning. Proc. IEEE World Congress on Computational Intelligence and IEEE Int. Joint Conf. on Neural Networks, 3:2051-2056.
[26]Peng, J., Williams, R.J., 1996. Incremental multi-step Qlearning. Mach. Learn., 22(1-3):283-290.
[27]Poty, A., Melchior, P., Oustaloup, A., 2004. Dynamic Path Planning for Mobile Robots Using Fractional Potential Field. Proc. 1st Int. Symp. on Control, Communications and Signal Processing, p.557-561.
[28]Saab, Y., VanPutte, M., 1999. Shortest path planning on topographical maps. IEEE Trans. Syst. Man Cybern. A, 29(1):139-150.
[29]Senda, K., Mano, S., Fujii, S., 2003. A Reinforcement Learning Accelerated by State Space Reduction. SICE Annual Conf., 2:1992-1997.
[30]Song, Y., Li, Y., Li, C., Zhang, G., 2012. An efficient initialization approach of Q-learning for mobile robots. Int. J. Control Autom. Syst., 10(1):166-172.
[31]Still, S., Precup, D., 2012. An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci., 131(3):139-148.
[32]Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambriage, MA.
[33]Tsai, C.C., Huang, H.C., Chan, C.K., 2011. Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation. IEEE Trans. Ind. Electron., 58(10):4813-4821.
[34]Wang, Z., Zhu, X., Han, Q., 2011. Mobile robot path planning based on parameter optimization ant colony algorithm. Proc. Eng., 15:2738-2741.
[35]Watkins, C.J.C.H., 1989. Learning from Delayed Rewards. University of Cambridge, Cambridge, UK.
[36]Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3-4):279-292.
[37]Wiewiora, E., 2003. Potential-based shaping and Q-value initialization are equivalent. Artif. Intell. Res., 19:205-208.
[38]Yang, S.X., Meng, M., 2000. An efficient neural network approach to dynamic robot motion planning. Neur. Networks, 13(2):143-148.
[39]Yang, X., Moallem, M., Patel, R.V., 2005. A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation. IEEE Trans. Syst. Man Cybern. B, 35(6):1214-1224.
[40]Yun, S.C., Ganapathy, V., Chong, L.O., 2010. Improved Genetic Algorithms Based Optimum Path Planning for Mobile Robot. Proc. 11th Int. Conf. on Control, Automation, Robotics and Vision, p.1565-1570.
Open peer comments: Debate/Discuss/Question/Opinion
<1>