Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2024 Vol.25 No.11 P.1446-1465

Domain adaptation in reinforcement learning: a comprehensive and systematic study

Author(s): Amirfarhad FARHADI, Mitra MIRZAREZAEE, Arash SHARIFI, Mohammad TESHNEHLAB
Affiliation(s): 1. Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran 1477893855, Iran more
Corresponding email(s): a.sharifi@srbiau.ac.ir
Key Words: Reinforcement learning, Domain adaptation, Machine learning

Share this article to： More <<< Previous Article \|Next Article >>>

Amirfarhad FARHADI, Mitra MIRZAREZAEE, Arash SHARIFI, Mohammad TESHNEHLAB. Domain adaptation in reinforcement learning: a comprehensive and systematic study[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(11): 1446-1465.

@article{title="Domain adaptation in reinforcement learning: a comprehensive and systematic study",
author="Amirfarhad FARHADI, Mitra MIRZAREZAEE, Arash SHARIFI, Mohammad TESHNEHLAB",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="11",
pages="1446-1465",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300668"
}

%0 Journal Article
%T Domain adaptation in reinforcement learning: a comprehensive and systematic study
%A Amirfarhad FARHADI
%A Mitra MIRZAREZAEE
%A Arash SHARIFI
%A Mohammad TESHNEHLAB
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 11
%P 1446-1465
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300668

TY - JOUR
T1 - Domain adaptation in reinforcement learning: a comprehensive and systematic study
A1 - Amirfarhad FARHADI
A1 - Mitra MIRZAREZAEE
A1 - Arash SHARIFI
A1 - Mohammad TESHNEHLAB
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 11
SP - 1446
EP - 1465
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300668

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: reinforcement learning (RL) has shown significant potential for dealing with complex decision-making problems. However, its performance relies heavily on the availability of a large amount of high-quality data. In many real-world situations, data distribution in the target domain may differ significantly from that in the source domain, leading to a significant drop in the performance of RL algorithms. domain adaptation (DA) strategies have been proposed to address this issue by transferring knowledge from a source domain to a target domain. However, there have been no comprehensive and in-depth studies to evaluate these approaches. In this paper we present a comprehensive and systematic study of DA in RL. We first introduce the basic concepts and formulations of DA in RL and then review the existing DA methods used in RL. Our main objective is to fill the existing literature gap regarding DA in RL. To achieve this, we conduct a rigorous evaluation of state-of-the-art DA approaches. We aim to provide comprehensive insights into DA in RL and contribute to advancing knowledge in this field. The existing DA approaches are divided into seven categories based on application domains. The approaches in each category are discussed based on the important data adaptation metrics, and then their key characteristics are described. Finally, challenging issues and future research trends are highlighted to assist researchers in developing innovative improvements.

综述：强化学习中的领域适应

Amirfarhad FARHADI¹, Mitra MIRZAREZAEE¹, Arash SHARIFI¹, Mohammad TESHNEHLAB²
¹伊斯兰阿扎德大学计算机工程系，伊朗德黑兰市，1477893855
²KN图什理工大学控制工程学院，伊朗德黑兰市，1999143344
摘要：强化学习（RL）在处理复杂决策问题方面显示出巨大的潜力。然而，其性能很大程度上依赖于大量高质量数据的可用性。在许多实际情况中，目标域的数据分布可能与源域的数据分布有很大差异，导致强化学习算法的性能显著下降。领域适应（DA）策略通过将知识从源域转移到目标域来解决这一问题。然而，目前尚无全面且深入的研究来评估这些方法。本文对强化学习中的领域适应进行了全面系统的研究。首先介绍强化学习中领域适应的基本概念和基本表述，然后对其中现有的领域适应方法进行综述。主要目的是填补关于强化学习中领域适应的现有文献空白。为了实现这一目的，本文对最先进的领域适应方法进行了严格的评估，希望为强化学习中的领域适应提供全面的见解，并为该领域的知识进步做出贡献。现有的领域适应方法根据应用领域分为7类。基于重要的数据自适应度量对每一类方法进行讨论，并描述它们的关键特征。最后，强调了具有挑战性的问题和未来的研究趋势，以帮助研究人员创新和改进。

关键词：强化学习；领域适应；机器学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abdul Samad SR, Balasubaramanian S, Al-Kaabi AS, et al., 2023. Analysis of the performance impact of fine-tuned machine learning model for phishing URL detection. Electronics, 12(7):1642.

[2]Bagheri M, 2021. Clustering Individual Entities Based on Common Features. PhD Dissemination, University of Houston, Houston, USA.

[3]Bolhassani M, Oksuz I, 2021. Semi-supervised segmentation of multi-vendor and multi-center cardiac MRI. 29^th Signal Processing and Communications Applications Conf, p.1-4.

[4]Boute RN, Gijsbrechts J, van Jaarsveld W, et al., 2022. Deep reinforcement learning for inventory control: a roadmap. Eur J Oper Res, 298(2):401-412.

[5]Bu FY, Wang X, 2019. A smart agriculture IoT system based on deep reinforcement learning. Fut Gener Comput Syst, 99:500-507.

[6]Carr T, Chli M, Vogiatzis G, 2019. Domain adaptation for reinforcement learning on the Atari. 18^th Int Conf on Autonomous Agents and Multiagent Systems, p.1859-1861.

[7]Chen DQ, Fisch A, Weston J, et al., 2017. Reading Wikipedia to answer open-domain questions. 55^th Annual Meeting of the Association for Computational Linguistics, p.1870-1879.

[8]Chen J, Wu XX, Duan LX, et al., 2022. Domain adversarial reinforcement learning for partial domain adaptation. IEEE Trans Neur Netw Learn Syst, 33(2):539-553.

[9]Chen XH, Jiang S, Xu F, et al., 2021. Cross-modal domain adaptation for cost-efficient visual reinforcement learning. 35^th Conf on Neural Information Processing Systems, p.12520-12532.

[10]Chu CH, Wang R, 2020. A survey of domain adaptation for machine translation. J Inform Process, 28:413-426.

[11]Dhingra B, Mazaitis K, Cohen WW, 2017. Quasar: datasets for question answering by search and reading.

[12]Di SM, Peng JS, Shen YY, et al., 2018. Transfer learning via feature isomorphism discovery. Proc 24^th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining, p.1301-1309.

[13]Dong JH, Cong Y, Sun G, et al., 2020. CSCL: critical semantic-consistent learning for unsupervised domain adaptation. 16^th European Conf on Computer Vision, p.745-762.

[14]Dunn M, Sagun L, Higgins M, et al., 2017. SearchQA: a new Q&A dataset augmented with context from a search engine.

[15]El Jery A, Aldrdery M, Ghoudi N, et al., 2023. Experimental investigation and proposal of artificial neural network models of lead and cadmium heavy metal ion removal from water using porous nanomaterials. Sustainability, 15(19):14183.

[16]Fang F, Dutta K, Datta A, 2014. Domain adaptation for sentiment classification in light of multiple sources. Inform J Comput, 26(3):586-598.

[17]Farhadi A, Sharifi A, 2024. Leveraging meta-learning to improve unsupervised domain adaptation. Comput J, 67(5):1838-1850.

[18]Farhadi A, Mirzarezaee M, Sharifi A, et al., 2023. Unsupervised domain adaptation for image classification based on deep neural networks. Intell Multim Process Commun Syst, 4(1):27-37 (in Persian).

[19]Finn C, Abbeel P, Levine S, 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proc 34^th Int Conf on Machine Learning, p.1126-1135.

[20]Gardner P, Liu X, Worden K, 2020. On the application of domain adaptation in structural health monitoring. Mech Syst Signal Process, 138:106550.

[21]Gašić M, Young S, 2014. Gaussian processes for POMDP-based dialogue manager optimization. IEEE/ACM Trans Audio Speech Language Process, 22(1):28-40.

[22]Ge L, Gao J, Zhang AD, 2013. OMS-TL: a framework of online multiple source transfer learning. Proc 22^nd ACM Int Conf on Information & Knowledge Management, p.2423-2428.

[23]Guan H, Liu MX, 2022. Domain adaptation for medical image analysis: a survey. IEEE Trans Biomed Eng, 69(3):1173-1185.

[24]Guo RY, Liu H, Liu D, 2024. When deep learning-based soft sensors encounter reliability challenges: a practical knowledge-guided adversarial attack and its defense. IEEE Trans Industr Inform, 20(2):2702-2714.

[25]Higgins I, Pal A, Rusu A, et al., 2017. DARLA: improving zero-shot transfer in reinforcement learning. 34^th Int Conf on Machine Learning, p.1480-1490.

[26]Jannat MKA, Islam MS, Yang SH, et al., 2023. Efficient Wi-Fi-based human activity recognition using adaptive antenna elimination. IEEE Access, 11:105440-105454.

[27]Jeong R, Aytar Y, Khosid D, et al., 2020. Self-supervised sim-to-real adaptation for visual robotic manipulation. IEEE Int Conf on Robotics and Automation, p.2718-2724.

[28]Jiang J, Zhai CX, 2007. Instance weighting for domain adaptation in NLP. 45^th Annual Meeting of the Association of Computational Linguistics, p.264-271.

[29]Joshi M, Choi E, Weld D, et al., 2017. TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. 55^th Annual Meeting of the Association for Computational Linguistics, p.1601-1611.

[30]Khader N, Yoon SW, 2021. Adaptive optimal control of stencil printing process using reinforcement learning. Robot Comput Integr Manuf, 71:102132.

[31]Khodayari M, Razmi J, Babazadeh R, 2019. An integrated fuzzy analytical network process for prioritisation of new technology-based firms in Iran. Int J Ind Syst Eng, 32(4):424-442.

[32]Li SD, Chaplot DS, Tsai YHH, et al., 2020. Unsupervised domain adaptation for visual navigation.

[33]Li X, Zhong JP, Kamruzzaman MM, 2021. Complicated robot activity recognition by quality-aware deep reinforcement learning. Fut Gener Comput Syst, 117:480-485.

[34]Li XT, Sun Y, 2021. Application of RBF neural network optimal segmentation algorithm in credit rating. Neur Comput Appl, 33(14):8227-8235.

[35]Liu BY, Guo YH, Ye JP, et al., 2020. Selective pseudo-labeling with reinforcement learning for semi-supervised domain adaptation. 32^nd British Machine Vision Conf, p.299.

[36]Liu MF, Song Y, Zou HB, et al., 2019. Reinforced training data selection for domain adaptation. Proc 57^th Annual Meeting of the Association for Computational Linguistics, p.1957-1968.

[37]Liu Q, Yuan H, Hamzaoui R, et al., 2021. Reduced reference perceptual quality model with application to rate control for video-based point cloud compression. IEEE Trans Image Process, 30:6623-6636.

[38]Liu SP, Tian GH, Cui YC, et al., 2022. A deep Q-learning network based active object detection model with a novel training algorithm for service robots. Front Inform Technol Electron Eng, 23(11):1673-1683.

[39]Liu X, Zhou GH, Kong MH, et al., 2023a. Developing multi-labelled corpus of Twitter short texts: a semi-automatic method. Systems, 11(8):390.

[40]Liu X, Wang S, Lu SY, et al., 2023b. Adapting feature selection algorithms for the classification of Chinese texts. Systems, 11(9):483.

[41]López M, Valdivia A, Martínez-Cámara E, et al., 2019. E²SAM: evolutionary ensemble of sentiment analysis methods for domain adaptation Inform Sci, 480:273-286.

[42]Madadi Y, Seydi V, Nasrollahi K, et al., 2020. Deep visual unsupervised domain adaptation for classification tasks: a survey. IET Image Process, 14(14):3283-3299.

[43]Monjezi V, Trivedi A, Tan G, et al., 2023. Information-theoretic testing and debugging of fairness defects in deep neural networks. IEEE/ACM 45^th Int Conf on Software Engineering, p.1571-1582.

[44]Mou JH, Gao KZ, Duan PY, et al., 2023. A machine learning approach for energy-efficient intelligent transportation scheduling problem in a real-world dynamic circumstances. IEEE Trans Intell Trans Syst, 24(12):15527-15539.

[45]Patel VM, Gopalan R, Li RN, et al., 2015. Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag, 32(3):53-69.

[46]Patel Y, Chitta K, Jasani B, 2018. Learning sampling policies for domain adaptation.

[47]Pourghebleh B, Navimipour NJ, 2017. Data aggregation mechanisms in the Internet of Things: a systematic review of the literature and recommendations for future research. J Netw Comput Appl, 97:23-34.

[48]Pourghebleh B, Hayyolalam V, Aghaei Anvigh A, 2020. Service discovery in the Internet of Things: review of current trends and research challenges. Wirel Netw, 26(7):5371-5391.

[49]Rajput SPS, Webber JL, Bostani A, et al., 2023. Using machine learning architecture to optimize and model the treatment process for saline water level analysis. Water Reuse, 13(1):51-67.

[50]Saeed R, Feng HH, Wang X, et al., 2022. Fish quality evaluation by sensor and machine learning: a mechanistic review. Food Contr, 137:108902.

[51]Saunders D, 2022. Domain adaptation and multi-domain adaptation for neural machine translation: a survey. J Artif Intell Res, 75:351-424.

[52]Shoeleh F, Asadpour M, 2017. Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Patt Recognit Lett, 87:104-116.

[53]Shoeleh F, Asadpour M, 2020. Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Appl Intell, 50(2):502-518.

[54]Singhal P, Walambe R, Ramanna S, et al., 2023. Domain adaptation: challenges, methods, datasets, and applications. IEEE Access, 11:6973-7020.

[55]Su PH, Budzianowski P, Ultes S, et al., 2017. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. 18^th Annual SIGDIAL Meeting on Discourse and Dialogue, p.147-157.

[56]Sun SL, Shi HL, Wu YB, 2015. A survey of multi-source domain adaptation. Inform Fusion, 24:84-92.

[57]Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2^nd Ed.). Cambridge, UK.

[58]Truong J, Chernova S, Batra D, 2021. Bi-directional domain adaptation for sim2real transfer of embodied navigation agents. IEEE Robot Autom Lett, 6(2):2634-2641.

[59]Voulgarelis S, Fathi F, Stucke AG, et al., 2021. Evaluation of visible diffuse reflectance spectroscopy in liver tissue: validation of tissue saturations using extracorporeal circulation. J Biomed Opt, 26(5):055002.

[60]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726-1744.

[61]Wang M, Deng WH, 2018. Deep visual domain adaptation: a survey. Neurocomputing, 312:135-153.

[62]Wei GQ, Wei ZQ, Huang L, et al., 2021. Center-aligned domain adaptation network for image classification. Expert Syst Appl, 168:114381.

[63]Yang M, Tu WT, Qu Q, et al., 2018. Personalized response generation by dual-learning based domain adaptation. Neur Netw, 103:72-82.

[64]Yang ZL, Hu JJ, Salakhutdinov R, et al., 2017. Semi-supervised QA with generative domain-adaptive nets. 55^th Annual Meeting of the Association for Computational Linguistic, p.1040-1050.

[65]Yoon J, Arik S, Pfister T, 2020. Data valuation using reinforcement learning. 37^th Int Conf on Machine Learning, p.10842-10851.

[66]Zhang H, Luo GY, Li JL, et al., 2022. C2FDA: coarse-to-fine domain adaptation for traffic object detection. IEEE Trans Intell Transp Syst, 23(8):12633-12647.

[67]Zhang JW, Tai L, Yun P, et al., 2019. VR-goggles for robots: real-to-sim domain adaptation for visual control. IEEE Robot Autom Lett, 4(2):1148-1155.

[68]Zhang NJ, Fan KX, Ji HW, et al., 2023. Identification of risk factors for infection after mitral valve surgery through machine learning approaches. Front Cardiovasc Med, 10:1050698.

[69]Zhao N, Li DQ, Gu SX, et al., 2024. Analytical fragility relation for buried cast iron pipelines with lead-caulked joints based on machine learning algorithms. Earthq Spectra, 40(1):566-583.

[70]Zhao SC, Li B, Reed C, et al., 2020. Multi-source domain adaptation in the deep learning era: a systematic survey.

[71]Zhao SC, Yue XY, Zhang SH, et al., 2022. A review of single-source deep unsupervised visual domain adaptation. IEEE Trans Neur Netw Learn Syst, 33(2):473-493.

Open peer comments: Debate/Discuss/Question/Opinion

<1>