|
|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.10 P.1969-1983
Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
Abstract: Tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for Tibetan Jiu chess is challenging, especially given the constraints of hardware resources. To address this, we propose a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel Monte Carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square capturing or victory outcomes. Additionally, within the HBM, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In the experiments against a layout model using the AlphaZero method, SLM achieves a 74% win rate, with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model. SLM also won the first place at the 2024 China National Computer Game Tournament. HBM achieves a 70% win rate when playing against other Tibetan Jiu chess models. When used together, SLM and HBM in JFA achieve an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA effectively enhances artificial intelligence (AI) performance in Tibetan Jiu chess.
Key words: Games; Reinforcement learning; Tibetan Jiu chess; Separate two-stage model; Self-play; Hierarchical neural network; Parallel Monte Carlo tree search
1中央民族大学民族语言智能分析与安全治理教育部重点实验室,中国北京市,100081
2中央民族大学信息工程学院,中国北京市,100081
3北京大学湍流与复杂系统国家重点实验室,中国北京市,100871
4北京大学通用人工智能国家重点实验室,中国北京市,100871
5西藏大学信息科学技术学院,中国拉萨市,850000
摘要:藏族久棋作为国家级非物质文化遗产,是一种复杂的棋类游戏,包含布局和战斗两个阶段。在硬件资源受限下,提升藏族久棋深度强化学习(DRL)模型的棋力水平成为难题。为解决该问题,本文提出一种基于层级神经网络和知识引导的两阶段模型—JFA。其包含两个子模型:用于布局阶段的策略布局模型(SLM)和用于对战阶段的分层对战模型(HBM)。两个子模型采用相似的网络结构,利用并行蒙特卡洛树搜索(MCTS)方法,独立进行自对弈训练。HBM由分层神经网络构成,上层网络用以选择移动和跳吃动作,下层网络用以选择成方吃子动作。基于人类知识设计辅助智能体,配合SLM和HBM模拟整个对局过程,根据成方情况或者实际胜负为棋局提供奖励信号。此外,在HBM模型中,提出两种基于人类知识的剪枝方法,分别对并行MTCS进行剪枝,以及对下层网络中的吃子动作进行捕获。与使用AlphaZero方法布局模型对弈的实验中,SLM胜率为74%,其决策时间大约缩短至AlphaZero模型所需时间的1/147。SLM在2024年中国全国计算机游戏大赛中摘得桂冠。HBM模型在与其他藏族久棋模型的对弈中取得70%胜率。当SLM与HBM在JFA框架下协同运作时,胜率提升至81%,达到人类业余四段棋手水平。这些成果表明JFA框架能有效提升藏族久棋人工智能的竞技表现。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2500287
CLC number:
TP18
Download Full Text:
Downloaded:
55
Clicked:
172
Cited:
0
On-line Access:
2025-11-17
Received:
2025-05-02
Revision Accepted:
2025-11-18
Crosschecked:
2025-09-15