JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2025 Vol.26 No.10 P.1969-1983

Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess

Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN

Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China; School of Information Engineering, Minzu University of China, Beijing 100081, China; State Key Laboratory for Turbulence and Complex Systems, Peking University, Beijing 100871, China; State Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China; School of Information Science and Technology, Xizang University, Lhasa 850000, China

lixiali@muc.edu.cn, junzhi.yu@ia.ac.cn

Abstract: Tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for Tibetan Jiu chess is challenging, especially given the constraints of hardware resources. To address this, we propose a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel Monte Carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square capturing or victory outcomes. Additionally, within the HBM, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In the experiments against a layout model using the AlphaZero method, SLM achieves a 74% win rate, with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model. SLM also won the first place at the 2024 China National Computer Game Tournament. HBM achieves a 70% win rate when playing against other Tibetan Jiu chess models. When used together, SLM and HBM in JFA achieve an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA effectively enhances artificial intelligence (AI) performance in Tibetan Jiu chess.

Key words: Games; Reinforcement learning; Tibetan Jiu chess; Separate two-stage model; Self-play; Hierarchical neural network; Parallel Monte Carlo tree search

Chinese Summary <1> JFA：融合分层神经网络与人类知识的藏族久棋两阶段强化学习模型

李霞丽^1,2，樊霄宇^1,2，喻俊志^3,4，董志城⁵，才让先木⁵，兰萍⁵
¹中央民族大学民族语言智能分析与安全治理教育部重点实验室，中国北京市，100081
²中央民族大学信息工程学院，中国北京市，100081
³北京大学湍流与复杂系统国家重点实验室，中国北京市，100871
⁴北京大学通用人工智能国家重点实验室，中国北京市，100871
⁵西藏大学信息科学技术学院，中国拉萨市，850000
摘要：藏族久棋作为国家级非物质文化遗产，是一种复杂的棋类游戏，包含布局和战斗两个阶段。在硬件资源受限下，提升藏族久棋深度强化学习（DRL）模型的棋力水平成为难题。为解决该问题，本文提出一种基于层级神经网络和知识引导的两阶段模型—JFA。其包含两个子模型：用于布局阶段的策略布局模型（SLM）和用于对战阶段的分层对战模型（HBM）。两个子模型采用相似的网络结构，利用并行蒙特卡洛树搜索（MCTS）方法，独立进行自对弈训练。HBM由分层神经网络构成，上层网络用以选择移动和跳吃动作，下层网络用以选择成方吃子动作。基于人类知识设计辅助智能体，配合SLM和HBM模拟整个对局过程，根据成方情况或者实际胜负为棋局提供奖励信号。此外，在HBM模型中，提出两种基于人类知识的剪枝方法，分别对并行MTCS进行剪枝，以及对下层网络中的吃子动作进行捕获。与使用AlphaZero方法布局模型对弈的实验中，SLM胜率为74%，其决策时间大约缩短至AlphaZero模型所需时间的1/147。SLM在2024年中国全国计算机游戏大赛中摘得桂冠。HBM模型在与其他藏族久棋模型的对弈中取得70%胜率。当SLM与HBM在JFA框架下协同运作时，胜率提升至81%，达到人类业余四段棋手水平。这些成果表明JFA框架能有效提升藏族久棋人工智能的竞技表现。

关键词组：博弈；强化学习；藏族久棋；独立两阶段模型；自对弈；分层神经网络；并行蒙特卡洛树搜索

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2500287

CLC number:

TP18

Download Full Text:

Click Here

Downloaded:

Clicked:

172

Cited:

On-line Access:

2025-11-17

Received:

2025-05-02

Revision Accepted:

2025-11-18

Crosschecked:

2025-09-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service