Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

Greedy feature replacement for online value function approximation

Abstract: Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.

Key words: Reinforcement learning, Function approximation, Feature dependency, Online expansion, Feature replacement

Chinese Summary  <63> 用于在线值函数近似的贪婪特征替换方法

研究目的:强化学习需要使用值函数近似来处理真实世界中的高维度状态空间问题,而值函数近似的效果很大程度上取决于合适特征表示的选取。表示扩展技术可使线性近似器更有效地表示值函数,然而现有的表示扩展技术或者只适用于低维度问题,或者在学习过程中不能及时发现重要的特征依赖。因此,本文提出了一种新型在线特征扩展技术——贪婪特征替换(GFR)。
创新要点:通过发现替换区域来执行特征替换的方式,贪婪特征替换方法增量地扩展特征表示,可以处理高维度状态空间问题,解决了传统在线扩展技术的局限。通过识别替换区域,贪婪特征替换方法优化了特征组合的加入顺序,提高了特征表示扩展的性能。
研究方法:引入虚拟时序差的概念,模拟新特征组合加入特征表示后的时序差,通过分析实际时序差与虚拟时序差累积情况的差距,选择新的特征组合。如差距超过给定阈值,这个特征组合将被作为新特征。在每次特征替换中,两个原有的特征被一个新的特征组合取代。
重要结论:理论分析表明,使用贪婪特征替换方法的值函数近似可以渐进地得到最优近似结果。对两个有代表性的学习问题的实验结果显示,贪婪特征替换可以达到更快的学习效果,而且具备处理大规模状态空间的能力。贪婪特征替换方法可以应用于各种在线值函数近似。

关键词组:强化学习;值函数近似;特征依赖;在线扩展;特征替换


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.C1300246

CLC number:

TP181

Download Full Text:

Click Here

Downloaded:

2887

Download summary:

<Click Here> 

Downloaded:

2059

Clicked:

7027

Cited:

0

On-line Access:

2014-03-05

Received:

2013-09-06

Revision Accepted:

2014-01-14

Crosschecked:

2014-02-21

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE