|
Journal of Zhejiang University SCIENCE C
ISSN 1869-1951(Print), 1869-196x(Online), Monthly
2014 Vol.15 No.3 P.223-231
Greedy feature replacement for online value function approximation
Abstract: Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.
Key words: Reinforcement learning, Function approximation, Feature dependency, Online expansion, Feature replacement
创新要点:通过发现替换区域来执行特征替换的方式,贪婪特征替换方法增量地扩展特征表示,可以处理高维度状态空间问题,解决了传统在线扩展技术的局限。通过识别替换区域,贪婪特征替换方法优化了特征组合的加入顺序,提高了特征表示扩展的性能。
研究方法:引入虚拟时序差的概念,模拟新特征组合加入特征表示后的时序差,通过分析实际时序差与虚拟时序差累积情况的差距,选择新的特征组合。如差距超过给定阈值,这个特征组合将被作为新特征。在每次特征替换中,两个原有的特征被一个新的特征组合取代。
重要结论:理论分析表明,使用贪婪特征替换方法的值函数近似可以渐进地得到最优近似结果。对两个有代表性的学习问题的实验结果显示,贪婪特征替换可以达到更快的学习效果,而且具备处理大规模状态空间的能力。贪婪特征替换方法可以应用于各种在线值函数近似。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/jzus.C1300246
CLC number:
TP181
Download Full Text:
Downloaded:
3196
Download summary:
<Click Here>Downloaded:
2330Clicked:
7885
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2014-02-21