
CLC number: TP18
On-line Access: 2025-11-17
Received: 2025-05-02
Revision Accepted: 2025-11-18
Crosschecked: 2025-09-15
Cited: 0
Clicked: 435
Citations: Bibtex RefMan EndNote GB/T7714
Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN. Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1969-1983.
@article{title="Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess",
author="Xiali LI, Xiaoyu FAN, Junzhi YU, Zhicheng DONG, Xianmu CAIRANG, Ping LAN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1969-1983",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500287"
}
%0 Journal Article
%T Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
%A Xiali LI
%A Xiaoyu FAN
%A Junzhi YU
%A Zhicheng DONG
%A Xianmu CAIRANG
%A Ping LAN
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1969-1983
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500287
TY - JOUR
T1 - Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
A1 - Xiali LI
A1 - Xiaoyu FAN
A1 - Junzhi YU
A1 - Zhicheng DONG
A1 - Xianmu CAIRANG
A1 - Ping LAN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1969
EP - 1983
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500287
Abstract: tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for tibetan Jiu chess is challenging, especially given the constraints of hardware resources. To address this, we propose a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel Monte Carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square capturing or victory outcomes. Additionally, within the HBM, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In the experiments against a layout model using the AlphaZero method, SLM achieves a 74% win rate, with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model. SLM also won the first place at the 2024 China National Computer Game Tournament. HBM achieves a 70% win rate when playing against other tibetan Jiu chess models. When used together, SLM and HBM in JFA achieve an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA effectively enhances artificial intelligence (AI) performance in tibetan Jiu chess.
[1]Brown N, Sandholm T, 2017. Safe and nested subgame solving for imperfect-information games. Proc 31st Int Conf on Neural Information Processing Systems, p.689-699.
[2]Brown N, Sandholm T, 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418-424.
[3]Chaslot GMB, Winands MHM, van den Herik HJ, 2008. Parallel Monte-Carlo tree search. 6th Int Conf on Computers and Games, p.60-71.
[4]Chu SJ, 2024. APSN: adaptive prediction sample network in deep Q learning. Proc 3rd Int Conf on Algorithms, Microchips, and Network Applications, p.461-465.
[5]Haarnoja T, Zhou A, Abbeel P, et al., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc 35th Int Conf on Machine Learning, p.1861-1870.
[6]Hessel M, Modayil J, Van Hasselt H, et al., 2018. Rainbow: combining improvements in deep reinforcement learning. 32nd AAAI Conf on Artificial Intelligence, p.3215-3222.
[7]Huang BC, Boularias A, Yu JJ, 2022. Parallel Monte Carlo tree search with batched rigid-body simulations for speeding up long-horizon episodic robot planning. IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1153-1160.
[8]Itaya H, Hirakawa T, Yamashita T, et al., 2024. Mask-attention A3C: visual explanation of action-state value in deep reinforcement learning. IEEE Access, 12:86553-86571.
[9]Li JJ, Koyamada S, Ye QW, et al., 2020. Suphx: mastering Mahjong with deep reinforcement learning. https://arxiv.org/abs/2003.13590
[10]Li XL, Wang S, Lv ZY, et al., 2018. Strategy research based on chess shapes for Tibetan Jiu computer game. ICGA J, 40(3):318-328.
[11]Li XL, Lv ZY, Wu LC, et al., 2020. Hybrid online and offline reinforcement learning for Tibetan Jiu chess. Complexity, 2020(1):4708075.
[12]Li XL, Chen YD, Zhang YY, et al., 2023. A phased game algorithm combining deep reinforcement learning and UCT for Tibetan Jiu chess. IEEE 47th Annual Computers, Software, and Applications Conf, p.390-395.
[13]Li XL, Zhang YY, Wu LC, et al., 2024. TibetanGoTinyNet: a lightweight U-Net style network for zero learning of Tibetan Go. Front Inform Technol Electron Eng, 25(7):924-937.
[14]Liang ZH, Li G, Gu RQ, et al., 2024. SampleViz: concept based sampling for policy refinement in deep reinforcement learning. IEEE 17th Pacific Visualization Conf, p.359-368.
[15]Liu AJ, Chen JS, Yu MZ, et al., 2020. Watch the unobserved: a simple approach to parallelizing Monte Carlo tree search. https://arxiv.org/abs/1810.11755
[16]Meng Y, Kannan R, Prasanna V, 2022. Accelerating Monte-Carlo tree search on CPU-FPGA heterogeneous platform. 32nd Int Conf on Field-Programmable Logic and Applications, p.176-182.
[17]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
[18]OpenAI, 2019. Dota 2 with large scale deep reinforcement learning. https://arxiv.org/abs/1912.06680
[19]Perez D, Samothrakis S, Lucas S, 2014. Knowledge-based fast evolutionary MCTS for general video game playing. IEEE Conf on Computational Intelligence and Games, p.1-8.
[20]Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489.
[21]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359.
[22]Silver D, Hubert T, Schrittwieser J, et al., 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140-1144.
[23]Tian YD, Gong QC, Shang WL, et al., 2017. ELF: an extensive, lightweight and flexible research platform for real-time strategy games. Proc 31st Int Conf on Neural Information Processing Systems, p.2656-2666.
[24]Vinyals O, Babuschkin I, Czarnecki WM, et al., 2019. Grand-master level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350-354.
[25]Wang SY, 2023. Research and Implementation of Computer Game Algorithm of Tibetan Jiu Chess. MS Thesis, Minzu University of China, Beijing, China (in Chinese).
[26]Wang YJ, Liang K, Qiao JL, et al., 2022. The application of improved UCT combined with neural network in Tibetan Jiu chess. Int J Wirel Mob Comput, 23(1):22-32.
[27]Yang XF, Aasawat TK, Yoshizoe K, 2021. Practical massively parallel Monte-Carlo tree search applied to molecular design. https://arxiv.org/abs/2006.10504
[28]Zhang XC, Liu L, Chen L, et al., 2021. An evaluation method for the computer game agent of the intangible heritage Tibetan Jiu chess item. J Chongqing Univ Technol (Nat Sci), 35(12):119-126 (in Chinese).
[29]Zhang Z, Xu SL, Xia YY, et al., 2023. Two improved algorithms based on DQN. IEEE Int Conf on Signal Processing, Communications and Computing, p.1-4.
Open peer comments: Debate/Discuss/Question/Opinion
<1>