CLC number: TP391.4
On-line Access: 2025-04-03
Received: 2024-05-17
Revision Accepted: 2024-09-18
Crosschecked: 2025-04-07
Cited: 0
Clicked: 258
Citations: Bibtex RefMan EndNote GB/T7714
Yuxi HAN, Dequan LI, Yang YANG. Significance extraction based on data augmentation for reinforcement learning[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(3): 385-399.
@article{title="Significance extraction based on data augmentation for reinforcement learning",
author="Yuxi HAN, Dequan LI, Yang YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="3",
pages="385-399",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400406"
}
%0 Journal Article
%T Significance extraction based on data augmentation for reinforcement learning
%A Yuxi HAN
%A Dequan LI
%A Yang YANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 3
%P 385-399
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400406
TY - JOUR
T1 - Significance extraction based on data augmentation for reinforcement learning
A1 - Yuxi HAN
A1 - Dequan LI
A1 - Yang YANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 3
SP - 385
EP - 399
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400406
Abstract: deep reinforcement learning has shown remarkable capabilities in visual tasks, but it does not have a good generalization ability in the context of interference signals in the input images; this approach is therefore hard to be applied to trained agents in a new environment. To enable agents to distinguish between noise signals and important pixels in images, data augmentation techniques and the establishment of auxiliary networks are proven effective solutions. We introduce a novel algorithm, namely, saliency-extracted Q-value by augmentation (SEQA), which encourages the agent to explore unknown states more comprehensively and focus its attention on important information. Specifically, SEQA masks out interfering features and extracts salient features and then updates the mask decoder network with critic losses to encourage the agent to focus on important features and make correct decisions. We evaluate our algorithm on the DeepMind Control generalization benchmark (DMControl-GB), and the experimental results show that our algorithm greatly improves training efficiency and stability. Meanwhile, our algorithm is superior to state-of-the-art reinforcement learning methods in terms of sample efficiency and generalization in most DMControl-GB tasks.
[1]Almuzairee A, Hansen N, Christensen HI, 2024. A recipe for unbounded data augmentation in visual reinforcement learning. https://arxiv.org/abs/2405.17416
[2]Antotsiou D, Ciliberto C, Kim TK, 2021. Adversarial imitation learning with trajectorial augmentation and correction. IEEE Int Conf on Robotics and Automation, p.4724-4730.
[3]Arulkumaran K, Deisenroth MP, Brundage M, et al., 2017. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag, 34(6):26-38.
[4]Bertoin D, Zouitine A, Zouitine M, et al., 2022. Look where you look! Saliency-guided Q-networks for generalization in visual reinforcement learning. Proc 36th Int Conf on Neural Information Processing Systems, Article 2225.
[5]Chen T, Kornblith S, Norouzi M, et al., 2020. A simple framework for contrastive learning of visual representations. Proc 37th Int Conf on Machine Learning, p.1597-1607.
[6]Cobbe K, Klimov O, Hesse C, et al., 2019. Quantifying generalization in reinforcement learning. Proc 36th Int Conf on Machine Learning, p.1282-1289.
[7]Farebrother J, Machado MC, Bowling M, 2018. Generalization and regularization in DQN. https://arxiv.org/abs/1810.00123
[8]Fu X, Yang G, Agrawal P, et al., 2021. Learning task informed abstractions. Proc 38th Int Conf on Machine Learning, p.3480-3491.
[9]Gamrian S, Goldberg Y, 2019. Transfer learning for related reinforcement learning tasks via image-to-image translation. Proc 36th Int Conf on Machine Learning, p.2063-2072.
[10]Gelada C, Kumar S, Buckman J, et al., 2019. DeepMDP: learning continuous latent space models for representation learning. Proc 36th Int Conf on Machine Learning, p.2170-2179.
[11]Grooten B, Tomilin T, Vasan G, et al., 2024. MaDi: learning to mask distractions for generalization in visual deep reinforcement learning. Proc 23rd Int Conf on Autonomous Agents and Multiagent Systems, p.733-742.
[12]Hansen N, Wang XL, 2021. Generalization in reinforcement learning by soft data augmentation. IEEE Int Conf on Robotics and Automation, p.13611-13617.
[13]Hansen N, Jangir R, Sun Y, et al., 2021a. Self-supervised policy adaptation during deployment. Proc 9th Int Conf on Learning Representations.
[14]Hansen N, Su H, Wang XL, 2021b. Stabilizing deep Q-learning with ConvNets and vision Transformers under data augmentation. Proc 35th Int Conf on Neural Information Processing Systems, Article 281.
[15]Hansen N, Yuan ZC, Ze YJ, et al., 2023. On pre-training for visuo-motor control: revisiting a learning-from-scratch baseline. Proc 40th Int Conf on Machine Learning, Article 506.
[16]Henderson P, Islam R, Bachman P, et al., 2017. Deep reinforcement learning that matters. Proc 32nd AAAI Conf on Artificial Intelligence, Article 392.
[17]Kaelbling LP, Littman ML, Cassandra AR, 1998. Planning and acting in partially observable stochastic domains. Artif Intell, 101(1-2):99-134.
[18]Kalashnikov D, Irpan A, Pastor P, et al., 2018. Scalable deep reinforcement learning for vision-based robotic manipulation. Proc 2nd Conf on Robot Learning, p.651-673.
[19]Khraishi R, Okhrati R, 2023. Simple noisy environment augmentation for reinforcement learning. https://arxiv.org/abs/2305.02882
[20]Kirk R, Zhang A, Grefenstette E, et al., 2023. A survey of zero-shot generalisation in deep reinforcement learning. J Artif Intell Res, 76:201-264.
[21]Kurniawati H, 2022. Partially observable Markov decision processes and robotics. Ann Rev Contr Rob Auton Syst, 5:253-277.
[22]Laskin M, Srinivas A, Abbeel P, 2020a. CURL: contrastive unsupervised representations for reinforcement learning. Proc 37th Int Conf on Machine Learning, Article 523.
[23]Laskin M, Lee K, Stooke A, et al., 2020b. Reinforcement learning with augmented data. Proc 34th Int Conf on Neural Information Processing Systems, Article 1669.
[24]Lee K, Lee K, Shin J, et al., 2020. Network randomization: a simple technique for generalization in deep reinforcement learning. Proc 8th Int Conf on Learning Representations.
[25]Levine S, Finn C, Darrell T, et al., 2016. End-to-end training of deep visuomotor policies. J Mach Learn Res, 17(1):1334-1373.
[26]Lin X, Baweja HS, Kantor GA, et al., 2019. Adaptive auxiliary task weighting for reinforcement learning. Proc 33rd Conf on Neural Information Processing Systems, p.4772-4783.
[27]Luketina J, Nardelli N, Farquhar G, et al., 2019. A survey of reinforcement learning informed by natural language. Proc 28th Int Joint Conf on Artificial Intelligence, p.6309-6317.
[28]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
[29]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[30]Nair A, Pong VH, Dalal M, et al., 2018. Visual reinforcement learning with imagined goals. Proc 32nd Int Conf on Neural Information Processing Systems, p.9209-9220.
[31]OpenAI, Akkaya I, Andrychowicz M, et al., 2019. Solving Rubik’s cube with a robot hand. https://arxiv.org/abs/1910.07113
[32]Pinto L, Andrychowicz M, Welinder P, et al., 2018. Asymmetric actor critic for image-based robot learning. https://arxiv.org/abs/1710.06542
[33]Sinha S, Mandlekar A, Garg A, 2022. S4RL: surprisingly simple self-supervision for offline reinforcement learning in robotics. Proc 5th Conf on Robot Learning, p.907-917.
[34]Song XY, Jiang YD, Tu S, et al., 2020. Observational overfitting in reinforcement learning. Proc 8th Int Conf on Learning Representations.
[35]Sutton RS, Barto AG, 2018. Reinforcement learning: an introduction. IEEE Trans Neur Netw, 9:1054.
[36]Tassa Y, Doron Y, Muldal A, et al., 2018. DeepMind Control Suite. https://arxiv.org/abs/1801.00690
[37]Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23-30.
[38]Wang XD, Lian L, Yu SX, 2021. Unsupervised visual attention and invariance for reinforcement learning. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6673-6683.
[39]Xing JW, Nagata T, Chen KX, et al., 2021. Domain adaptation in reinforcement learning via latent unified state representation. Proc 35th AAAI Conf on Artificial Intelligence, p.10452-10459.
[40]Yang SZ, Ze YJ, Xu HZ, 2023. MoVie: visual model-based policy adaptation for view generalization. Proc 37th Int Conf on Neural Information Processing Systems, Article 940.
[41]Yang W, Wang XL, Farhadi A, et al., 2019. Visual semantic navigation using scene priors. Proc 7th Int Conf on Learning Representations.
[42]Yarats D, Zhang A, Kostrikov I, et al., 2019. Improving sample efficiency in model-free reinforcement learning from images. Proc 35th AAAI Conf on Artificial Intelligence, p.10674-10681.
[43]Yarats D, Kostrikov I, Fergus R, 2021. Image augmentation is all you need: regularizing deep reinforcement learning from pixels. Proc 9th Int Conf on Learning Representations.
[44]Yu T, Zhang ZZ, Lan CL, et al., 2022. Mask-based latent reconstruction for reinforcement learning. Proc 36th Conf on Neural Information Processing Systems, p.25117-25131.
[45]Ze YJ, Hansen N, Chen YB, et al., 2023. Visual reinforcement learning with self-supervised 3D representations. IEEE Rob Autom Lett, 8(5):2890-2897.
[46]Zhang A, Ballas N, Pineau J, 2018. A dissection of overfitting and generalization in continuous reinforcement learning. https://arxiv.org/abs/1806.07937
[47]Zhang A, McAllister RT, Calandra R, et al., 2021. Learning invariant representations for reinforcement learning without reconstruction. Proc 9th Int Conf on Learning Representations.
[48]Zhang H, Chen HG, Xiao CW, et al., 2020. Robust deep reinforcement learning against adversarial perturbations on state observations. Proc 34th Int Conf on Neural Information Processing Systems, Article 1765.
[49]Zhao J, Zhao YP, Wang WX, et al., 2022. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents. Front Inform Technol Electron Eng, 23(7):1032-1042.
[50]Zhou ZH, 2024. Continuous control reinforcement learning: distributed distributional DrQ algorithms. https://arxiv.org/abs/2404.10645
[51]Zhu YK, Mottaghi R, Kolve E, et al., 2016. Target-driven visual navigation in indoor scenes using deep reinforcement learning. IEEE Int Conf on Robotics and Automation, p.3357-3364.
Open peer comments: Debate/Discuss/Question/Opinion
<1>