|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.2 P.278-292
Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach
Abstract: Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. In the path toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. Accordingly, this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment. By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.
Key words: Large language models (LLMs); Edge computing; Model-based reinforcement learning (MBRL); Split inference; Transformer
1浙江大学信息与电子工程学院,中国杭州市,310027
2之江实验室,中国杭州市,310012
摘要:在边缘计算环境中优化大型语言模型(LLMs)的部署对提升隐私保护和计算效率至关重要。为实现高效的无线LLM推理,本文全面分析了主流开源LLMs中不同分割点的影响。本文引入一个基于模型的强化学习(MBRL)框架,以确定边缘和用户设备(UE)之间的最佳分割点。通过引入奖励替代模型,该方法显著减少了频繁的性能评估的计算成本。广泛的仿真结果表明,该方法在不同网络条件下有效地平衡了推理性能和计算负载,为去中心化环境中LLM的部署提供稳健的解决方案。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2400468
CLC number:
TP391
Download Full Text:
Downloaded:
414
Download summary:
<Click Here>Downloaded:
46Clicked:
751
Cited:
0
On-line Access:
2025-03-07
Received:
2024-06-01
Revision Accepted:
2024-09-13
Crosschecked:
2025-03-07