Full Text:   <7>

CLC number: 

On-line Access: 2025-11-10

Received: 2025-05-03

Revision Accepted: 2025-09-15

Crosschecked: 0000-00-00

Cited: 0

Clicked: 13

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.

http://doi.org/10.1631/FITEE.2500287


JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess


Author(s):  Xiali LI1, 2, Xiaoyu FAN1, 2, Junzhi YU3, 4, Zhicheng DONG5, Xianmu CAIRANG5, Ping LAN5

Affiliation(s):  1Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE,Minzu University, Beijing 100081, China 2School of Information Engineering, Minzu University, Beijing 100081, China 3State Key Laboratory for Turbulence and Complex Systems, Peking University, Beijing 100871, China 4State Key Laboratory of General Arti?cial Intelligence, Peking University, Beijing 100871, China 5School of Information Science and Technology, Tibet University, Lhasa 850000, China

Corresponding email(s):   lixiali@muc.edu.cn, junzhi.yu@ia.ac.cn

Key Words:  Games, Reinforcement learning, Tibetan Jiu chess, Separate two-stage model, Self-play, Hierarchical neural network, Parallel monte carlo tree search


Xiali LI1,2, Xiaoyu FAN1,2, Junzhi YU3,4, Zhicheng DONG5, Xianmu CAIRANG5, Ping LAN5. JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess",
author="Xiali LI1,2, Xiaoyu FAN1,2, Junzhi YU3,4, Zhicheng DONG5, Xianmu CAIRANG5, Ping LAN5",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500287"
}

%0 Journal Article
%T JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess
%A Xiali LI1
%A
2
%A Xiaoyu FAN1
%A
2
%A Junzhi YU3
%A
4
%A Zhicheng DONG5
%A Xianmu CAIRANG5
%A Ping LAN5
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500287

TY - JOUR
T1 - JFA: two stage reinforcement learning model with hierarchical neural network and human knowledge for Tibetan Jiu chess
A1 - Xiali LI1
A1 -
2
A1 - Xiaoyu FAN1
A1 -
2
A1 - Junzhi YU3
A1 -
4
A1 - Zhicheng DONG5
A1 - Xianmu CAIRANG5
A1 - Ping LAN5
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500287


Abstract: 
tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for tibetan Jiu chess is challenging, especially given constraints of bhardware resource. To address this, we proposed a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel monte carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capture actions, and the lower network handling square capture actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square captures or victory outcomes. Additionally, within the HBM model, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In experiments against a layout model utilizing the AlphaZero method, SLM achieved a 74% win rate, with decision times reduced to approximately 147 th of those required by the AlphaZero model. SLM also won first place at 2024 China National Computer Game Tournament. HBM achieved a 70% win rate when playing against other tibetan Jiu chess models. When used together, SLM and HBM in JFA achieved an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA efectively enhances arti?cial intelligence (AI) performance in tibetan Jiu chess.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE