Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents

Abstract: Multi-agent reinforcement learning is difficult to apply in practice, partially because of the gap between simulated and real-world scenarios. One reason for the gap is that simulated systems always assume that agents can work normally all the time, while in practice, one or more agents may unexpectedly "crash" during the coordination process due to inevitable hardware or software failures. Such crashes destroy the cooperation among agents and lead to performance degradation. In this work, we present a formal conceptualization of a cooperative multi-agent reinforcement learning system with unexpected crashes. To enhance the robustness of the system to crashes, we propose a coach-assisted multi-agent reinforcement learning framework that introduces a virtual coach agent to adjust the crash rate during training. We have designed three coaching strategies (fixed crash rate, curriculum learning, and adaptive crash rate) and a re-sampling strategy for our coach agent. To our knowledge, this work is the first to study unexpected crashes in a multi-agent system. Extensive experiments on grid-world and StarCraft II micromanagement tasks demonstrate the efficacy of the adaptive strategy compared with the fixed crash rate strategy and curriculum learning strategy. The ablation study further illustrates the effectiveness of our re-sampling strategy.

Key words: Multi-agent system; Reinforcement learning; Unexpected crashed agents

Chinese Summary  <34> 针对意外崩溃智能体的教练辅助多智能体强化学习框架

赵鉴1,赵有朋1,王维埙2,阳明宇1,胡迅晗1,周文罡1,郝建业2,李厚强1
1中国科学技术大学信息科学技术学院,中国合肥市,230026
2天津大学智能与计算学部,中国天津市,300072
摘要:多智能体强化学习在实际场景中很难应用,一部分原因在于模拟环境和现实环境之间存在差距。造成这种差距的一个原因是,模拟系统总是假设智能体可以一直正常工作,而实际上,由于不可避免的硬件或软件故障,一个或多个智能体可能会在合作过程中意外“崩溃”。这样的崩溃会破坏智能体之间的合作,导致系统性能下降。本文中,我们给出了意外崩溃情况下合作多智能体强化学习系统的正式定义。为增强系统应对崩溃时的鲁棒性,提出教练辅助多智能体强化学习框架,其在训练过程中引入一个虚拟教练智能体,以调整系统的崩溃概率。为教练智能体设计了3种教练策略和重采样策略。据我们所知,这是研究多智能体系统中意外崩溃情况的首项工作。在网格环境和星际争霸微管理任务上的大量实验表明,相比固定崩溃概率和课程学习的教练策略,自适应策略更加有效。消融实验进一步展现了重采样策略的有效性。

关键词组:多智能体系统;强化学习;意外崩溃智能体


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100594

CLC number:

TP18

Download Full Text:

Click Here

Downloaded:

9132

Download summary:

<Click Here> 

Downloaded:

525

Clicked:

2854

Cited:

0

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2022-04-08

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE