JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess

Author(s): Xiali LI^1, ², Xiaoyu FAN^1, ², Junzhi YU^3, ⁴, Zhicheng DONG⁵, Xianmu CAIRANG⁵, Ping LAN⁵
Affiliation(s): ¹Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE,Minzu University, Beijing 100081, China ²School of Information Engineering, Minzu University, Beijing 100081, China ³State Key Laboratory for Turbulence and Complex Systems, Peking University, Beijing 100871, China ⁴State Key Laboratory of General Arti?cial Intelligence, Peking University, Beijing 100871, China ⁵School of Information Science and Technology, Tibet University, Lhasa 850000, China
Corresponding email(s): lixiali@muc.edu.cn, junzhi.yu@ia.ac.cn
Key Words: Games, Reinforcement learning, Tibetan Jiu chess, Separate two-stage model, Self-play, Hierarchical neural network, Parallel monte carlo tree search

Share this article to： More <<< Previous Article \|Next Article >>>

Xiali LI^1,², Xiaoyu FAN^1,², Junzhi YU^3,⁴, Zhicheng DONG⁵, Xianmu CAIRANG⁵, Ping LAN⁵. JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess",
author="Xiali LI^1,², Xiaoyu FAN^1,², Junzhi YU^3,⁴, Zhicheng DONG⁵, Xianmu CAIRANG⁵, Ping LAN⁵",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500287"
}

%0 Journal Article
%T JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess
%A Xiali LI^1
%A²
%A Xiaoyu FAN^1
%A²
%A Junzhi YU^3
%A⁴
%A Zhicheng DONG⁵
%A Xianmu CAIRANG⁵
%A Ping LAN⁵
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500287

TY - JOUR
T1 - JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess
A1 - Xiali LI^{1
A1 -}²
A1 - Xiaoyu FAN^{1
A1 -}²
A1 - Junzhi YU^{3
A1 -}⁴
A1 - Zhicheng DONG⁵
A1 - Xianmu CAIRANG⁵
A1 - Ping LAN⁵
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500287

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for tibetan Jiu chess is challenging, especially given constraints of bhardware resource. To address this, we proposed a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel monte carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capture actions, and the lower network handling square capture actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square captures or victory outcomes. Additionally, within the HBM model, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In experiments against a layout model utilizing the AlphaZero method, SLM achieved a 74% win rate, with decision times reduced to approximately 147 th of those required by the AlphaZero model. SLM also won first place at 2024 China National Computer Game Tournament. HBM achieved a 70% win rate when playing against other tibetan Jiu chess models. When used together, SLM and HBM in JFA achieved an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA efectively enhances arti?cial intelligence (AI) performance in tibetan Jiu chess.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article