Full Text:   <200>

Suppl. Mater.: 

CLC number: TP18

On-line Access: 2025-11-17

Received: 2025-05-02

Revision Accepted: 2025-11-18

Crosschecked: 2025-09-15

Cited: 0

Clicked: 435

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Junzhi YU

https://orcid.org/0000-0002-6347-572X

Xiali LI

https://orcid.org/0000-0001-7950-6204

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2025 Vol.26 No.10 P.1969-1983

http://doi.org/10.1631/FITEE.2500287


Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess


Author(s):  Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN

Affiliation(s):  Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China; more

Corresponding email(s):   lixiali@muc.edu.cn, junzhi.yu@ia.ac.cn

Key Words:  Games, Reinforcement learning, Tibetan Jiu chess, Separate two-stage model, Self-play, Hierarchical neural network, Parallel Monte Carlo tree search


Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN. Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1969-1983.

@article{title="Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess",
author="Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1969-1983",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500287"
}

%0 Journal Article
%T Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
%A Xiali LI
%A Xiaoyu FAN
%A Junzhi YU
%A Zhicheng DONG
%A Xianmu CAIRANG
%A Ping LAN
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1969-1983
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500287

TY - JOUR
T1 - Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
A1 - Xiali LI
A1 - Xiaoyu FAN
A1 - Junzhi YU
A1 - Zhicheng DONG
A1 - Xianmu CAIRANG
A1 - Ping LAN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1969
EP - 1983
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500287


Abstract: 
tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for tibetan Jiu chess is challenging, especially given the constraints of hardware resources. To address this, we propose a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel Monte Carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square capturing or victory outcomes. Additionally, within the HBM, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In the experiments against a layout model using the AlphaZero method, SLM achieves a 74% win rate, with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model. SLM also won the first place at the 2024 China National Computer Game Tournament. HBM achieves a 70% win rate when playing against other tibetan Jiu chess models. When used together, SLM and HBM in JFA achieve an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA effectively enhances artificial intelligence (AI) performance in tibetan Jiu chess.

JFA:融合分层神经网络与人类知识的藏族久棋两阶段强化学习模型

李霞丽1,2,樊霄宇1,2,喻俊志3,4,董志诚5,才让先木5,兰萍5
1中央民族大学民族语言智能分析与安全治理教育部重点实验室,中国北京市,100081
2中央民族大学信息工程学院,中国北京市,100081
3北京大学湍流与复杂系统国家重点实验室,中国北京市,100871
4北京大学通用人工智能国家重点实验室,中国北京市,100871
5西藏大学信息科学技术学院,中国拉萨市,850000
摘要:藏族久棋作为国家级非物质文化遗产,是一种复杂的棋类游戏,包含布局和战斗两个阶段。在硬件资源受限下,提升藏族久棋深度强化学习(DRL)模型的棋力水平成为难题。为解决该问题,本文提出一种基于层级神经网络和知识引导的两阶段模型—JFA。其包含两个子模型:用于布局阶段的策略布局模型(SLM)和用于对战阶段的分层对战模型(HBM)。两个子模型采用相似的网络结构,利用并行蒙特卡洛树搜索(MCTS)方法,独立进行自对弈训练。HBM由分层神经网络构成,上层网络用以选择移动和跳吃动作,下层网络用以选择成方吃子动作。基于人类知识设计辅助智能体,配合SLM和HBM模拟整个对局过程,根据成方情况或者实际胜负为棋局提供奖励信号。此外,在HBM模型中,提出两种基于人类知识的剪枝方法,分别对并行MTCS进行剪枝,以及对下层网络中的吃子动作进行捕获。与使用AlphaZero方法布局模型对弈的实验中,SLM胜率为74%,其决策时间大约缩短至AlphaZero模型所需时间的1/147。SLM在2024年中国全国计算机游戏大赛中摘得桂冠。HBM模型在与其他藏族久棋模型的对弈中取得70%胜率。当SLM与HBM在JFA框架下协同运作时,胜率提升至81%,达到人类业余四段棋手水平。这些成果表明JFA框架能有效提升藏族久棋人工智能的竞技表现。

关键词:博弈;强化学习;藏族久棋;独立两阶段模型;自对弈;分层神经网络;并行蒙特卡洛树搜索

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Brown N, Sandholm T, 2017. Safe and nested subgame solving for imperfect-information games. Proc 31st Int Conf on Neural Information Processing Systems, p.689-699.

[2]Brown N, Sandholm T, 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418-424.

[3]Chaslot GMB, Winands MHM, van den Herik HJ, 2008. Parallel Monte-Carlo tree search. 6th Int Conf on Computers and Games, p.60-71.

[4]Chu SJ, 2024. APSN: adaptive prediction sample network in deep Q learning. Proc 3rd Int Conf on Algorithms, Microchips, and Network Applications, p.461-465.

[5]Haarnoja T, Zhou A, Abbeel P, et al., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc 35th Int Conf on Machine Learning, p.1861-1870.

[6]Hessel M, Modayil J, Van Hasselt H, et al., 2018. Rainbow: combining improvements in deep reinforcement learning. 32nd AAAI Conf on Artificial Intelligence, p.3215-3222.

[7]Huang BC, Boularias A, Yu JJ, 2022. Parallel Monte Carlo tree search with batched rigid-body simulations for speeding up long-horizon episodic robot planning. IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1153-1160.

[8]Itaya H, Hirakawa T, Yamashita T, et al., 2024. Mask-attention A3C: visual explanation of action-state value in deep reinforcement learning. IEEE Access, 12:86553-86571.

[9]Li JJ, Koyamada S, Ye QW, et al., 2020. Suphx: mastering Mahjong with deep reinforcement learning. https://arxiv.org/abs/2003.13590

[10]Li XL, Wang S, Lv ZY, et al., 2018. Strategy research based on chess shapes for Tibetan Jiu computer game. ICGA J, 40(3):318-328.

[11]Li XL, Lv ZY, Wu LC, et al., 2020. Hybrid online and offline reinforcement learning for Tibetan Jiu chess. Complexity, 2020(1):4708075.

[12]Li XL, Chen YD, Zhang YY, et al., 2023. A phased game algorithm combining deep reinforcement learning and UCT for Tibetan Jiu chess. IEEE 47th Annual Computers, Software, and Applications Conf, p.390-395.

[13]Li XL, Zhang YY, Wu LC, et al., 2024. TibetanGoTinyNet: a lightweight U-Net style network for zero learning of Tibetan Go. Front Inform Technol Electron Eng, 25(7):924-937.

[14]Liang ZH, Li G, Gu RQ, et al., 2024. SampleViz: concept based sampling for policy refinement in deep reinforcement learning. IEEE 17th Pacific Visualization Conf, p.359-368.

[15]Liu AJ, Chen JS, Yu MZ, et al., 2020. Watch the unobserved: a simple approach to parallelizing Monte Carlo tree search. https://arxiv.org/abs/1810.11755

[16]Meng Y, Kannan R, Prasanna V, 2022. Accelerating Monte-Carlo tree search on CPU-FPGA heterogeneous platform. 32nd Int Conf on Field-Programmable Logic and Applications, p.176-182.

[17]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602

[18]OpenAI, 2019. Dota 2 with large scale deep reinforcement learning. https://arxiv.org/abs/1912.06680

[19]Perez D, Samothrakis S, Lucas S, 2014. Knowledge-based fast evolutionary MCTS for general video game playing. IEEE Conf on Computational Intelligence and Games, p.1-8.

[20]Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489.

[21]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359.

[22]Silver D, Hubert T, Schrittwieser J, et al., 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140-1144.

[23]Tian YD, Gong QC, Shang WL, et al., 2017. ELF: an extensive, lightweight and flexible research platform for real-time strategy games. Proc 31st Int Conf on Neural Information Processing Systems, p.2656-2666.

[24]Vinyals O, Babuschkin I, Czarnecki WM, et al., 2019. Grand-master level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350-354.

[25]Wang SY, 2023. Research and Implementation of Computer Game Algorithm of Tibetan Jiu Chess. MS Thesis, Minzu University of China, Beijing, China (in Chinese).

[26]Wang YJ, Liang K, Qiao JL, et al., 2022. The application of improved UCT combined with neural network in Tibetan Jiu chess. Int J Wirel Mob Comput, 23(1):22-32.

[27]Yang XF, Aasawat TK, Yoshizoe K, 2021. Practical massively parallel Monte-Carlo tree search applied to molecular design. https://arxiv.org/abs/2006.10504

[28]Zhang XC, Liu L, Chen L, et al., 2021. An evaluation method for the computer game agent of the intangible heritage Tibetan Jiu chess item. J Chongqing Univ Technol (Nat Sci), 35(12):119-126 (in Chinese).

[29]Zhang Z, Xu SL, Xia YY, et al., 2023. Two improved algorithms based on DQN. IEEE Int Conf on Signal Processing, Communications and Computing, p.1-4.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE