CLC number: TP391
On-line Access: 2025-03-07
Received: 2024-06-01
Revision Accepted: 2024-09-13
Crosschecked: 2025-03-07
Cited: 0
Clicked: 701
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0008-9570-2000
https://orcid.org/0000-0003-4297-5060
Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG. Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(2): 278-292.
@article{title="Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach",
author="Yuxuan CHEN, Rongpeng LI, Xiaoxue YU, Zhifeng ZHAO, Honggang ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="2",
pages="278-292",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400468"
}
%0 Journal Article
%T Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach
%A Yuxuan CHEN
%A Rongpeng LI
%A Xiaoxue YU
%A Zhifeng ZHAO
%A Honggang ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 2
%P 278-292
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400468
TY - JOUR
T1 - Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach
A1 - Yuxuan CHEN
A1 - Rongpeng LI
A1 - Xiaoxue YU
A1 - Zhifeng ZHAO
A1 - Honggang ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 2
SP - 278
EP - 292
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400468
Abstract: Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. In the path toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. Accordingly, this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment. By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.
[1]Abbas N, Zhang Y, Taherkordi A, et al., 2018. Mobile edge computing: a survey. IEEE Int Things J, 5(1):450-465.
[2]Bai YT, Jones A, Ndousse K, et al., 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862
[3]Beaulieu NC, Cheng C, 2005. Efficient Nakagami-m fading channel simulation. IEEE Trans Veh Technol, 54(2):413-424.
[4]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.
[5]Chen L, Ahmed NK, Dutta A, et al., 2024. The landscape and challenges of HPC research and LLMs. https://arxiv.org/abs/2402.02018
[6]Chen MZ, Gündüz D, Huang KB, et al., 2021. Distributed learning in wireless networks: recent progress and future challenges. IEEE J Sel Areas Commun, 39(12):3579-3605.
[7]Chen YX, Li RP, Zhao ZF, et al., 2024. NetGPT: an AI-native network architecture for provisioning beyond personalized generative services. IEEE Netw, 38(6):404-413.
[8]Cleveland WS, 1979. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74(368):829-836.
[9]Deisenroth MP, Rasmussen CE, 2011. PILCO: a model-based and data-efficient approach to policy search. Proc 28th Int Conf on Machine Learning, p.465-472.
[10]Dong QF, Chen XL, Satyanarayanan M, 2024. Creating edge AI from cloud-based LLMs. Proc 25th Int Workshop on Mobile Computing Systems and Applications, p.8-13.
[11]Egorov V, Shpilman A, 2022. Scalable multi-agent model-based reinforcement learning. Proc 21st Int Conf on Autonomous Agents and Multiagent Systems, p.381-390.
[12]Gemini Team Google, 2023. Gemini: a family of highly capable multimodal models. https://arxiv.org/abs/2312.11805
[13]Gupta O, Raskar R, 2018. Distributed learning of deep neural network over multiple agents. J Netw Comput Appl, 116:1-8.
[14]Gupta R, Sosio N, 2024. Introducing Prem-1B. https://blog.premai.io/introducing-prem-1b/
[15]Hadi MU, Tashi QA, Qureshi R, et al., 2023. A survey on large language models: applications, challenges, limitations, and practical usage. https://www.techrxiv.org/doi/full/10.36227/techrxiv.23589741.v1
[16]Icarte RT, Klassen TQ, Valenzano R, et al., 2023. Learning reward machines: a study in partially observable reinforcement learning. Artif Intell, 323:103989.
[17]Jiang AQ, Sablayrolles A, Mensch A, et al., 2023. Mistral 7B. https://arxiv.org/abs/2310.06825
[18]Jin MY, Yu QK, Shu D, et al., 2024. Health-LLM: personalized retrieval-augmented disease prediction system. https://arxiv.org/abs/2402.00746
[19]Kaddour J, Harris J, Mozes M, et al., 2023. Challenges and applications of large language models. https://arxiv.org/abs/2307.10169
[20]Kaiser L, Babaeizadeh M, Milos P, et al., 2019. Model-based reinforcement learning for Atari. https://arxiv.org/abs/1903.00374
[21]Karjee J, Naik SP, Anand K, et al., 2022. Split computing: DNN inference partition with load balancing in IoT-edge platform for beyond 5G. Meas Sens, 23:100409.
[22]Ke CH, Astuti L, 2023. Applying multi-agent deep reinforcement learning for contention window optimization to enhance wireless network performance. ICT Express, 9(5):776-782.
[23]Lan Q, Zeng QS, Popovski P, et al., 2021. Progressive feature transmission for split inference at the wireless edge. https://arxiv.org/abs/2112.07244
[24]Le Scao T, Fan A, Akiki C, et al., 2022. BLOOM: a 176B-parameter open-access multilingual language model. https://arxiv.org/abs/2211.05100
[25]Lee J, Lee H, Choi W, 2023. Wireless channel adaptive DNN split inference for resource-constrained edge devices. IEEE Commun Lett, 27(6):1520-1524.
[26]Letaief KB, Shi YM, Lu JM, et al., 2022. Edge artificial intelligence for 6G: vision, enabling technologies, and applications. IEEE J Sel Areas Commun, 40(1):5-36.
[27]Li E, Zeng LK, Zhou Z, et al., 2020. Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans Wirel Commun, 19(1):447-457.
[28]Li X, Lu LY, Ni W, et al., 2022. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans Veh Technol, 71(8):8810-8824.
[29]Li YX, 2017. Deep reinforcement learning: an overview. https://arxiv.org/abs/1701.07274
[30]Lin B, Zhang C, Peng T, et al., 2024. Infinite-LLM: efficient LLM service for long context with DistAttention and distributed KVCache. https://arxiv.org/abs/2401.02669
[31]Lin Z, Qu GQ, Chen QY, et al., 2024a. Pushing large language models to the 6G edge: vision, challenges, and opportunities. https://arxiv.org/abs/2309.16739
[32]Lin Z, Qu GQ, Chen XH, et al., 2024b. Split learning in 6G edge networks. IEEE Wirel Commun, 31(4):170-176.
[33]Liu D, Sun CJ, Yang CY, et al., 2020. Optimizing wireless systems using unsupervised and reinforced-unsupervised deep learning. IEEE Netw, 34(4):270-277.
[34]Luong NC, Hoang DT, Gong SM, et al., 2019. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor, 21(4):3133-3174.
[35]Mach P, Becvar Z, 2017. Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun Surv Tutor, 19(3):1628-1656.
[36]Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322-2358.
[37]Merity S, Xiong CM, Bradbury J, et al., 2016. Pointer sentinel mixture models. https://arxiv.org/abs/1609.07843
[38]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[39]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.
[40]Moerland TM, Broekens J, Plaat A, et al., 2023. Model-based reinforcement learning: a survey. Foundat Trends® Mach Learn, 16(1):1-118.
[41]Nakagami M, 1960. The m-distribution—a general formula of intensity distribution of rapid fading. In: Hoffman WC (Ed.), Statistical Methods in Radio Wave Propagation. Pergamon, UK, p.3-36.
[42]Nijkamp E, Pang B, Hayashi H, et al., 2022. CodeGen: an open large language model for code with multi-turn program synthesis. https://arxiv.org/abs/2203.13474
[43]Ong I, 2024. Efficient Distributed LLM Inference with Dynamic Partitioning. Technical Report UCB/EECS-2024-108, California, USA.
[44]OpenAI, 2023. GPT-4 Technical Report, San Francisco, USA.
[45]Patil R, Gudivada V, 2024. A review of current trends, techniques, and challenges in large language models (LLMs). Appl Sci, 14(5):2074.
[46]Pham QV, Fang F, Ha VN, et al., 2020. A survey of multi-access edge computing in 5G and beyond: fundamentals, technology integration, and state-of-the-art. IEEE Access, 8:116974-117017.
[47]Qian YC, Wu J, Wang R, et al., 2019. Survey on reinforcement learning applications in communication networks. J Commun Inform Netw, 4(2):30-39.
[48]Qiao LT, Zhou Y, 2023. Timely split inference in wireless networks: an accuracy-freshness tradeoff. IEEE Trans Veh Technol, 72(12):16817-16822.
[49]Romoff J, Henderson P, Piché A, et al., 2018. Reward estimation for variance reduction in deep reinforcement learning. Proc 2nd Conf on Robot Learning, p.674-699.
[50]Rozière B, Gehring J, Gloeckle F, et al., 2023. Code LLAMA: open foundation models for code. https://arxiv.org/abs/2308.12950
[51]Ryu J, Won D, Lee Y, 2022. A study of split learning model. 16th Int Conf on Ubiquitous Information Management and Communication, p.1-4.
[52]Satyanarayanan M, Bahl P, Caceres R, et al., 2009. The case for VM-based cloudlets in mobile computing. IEEE Pervas Comput, 8(4):14-23.
[53]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347
[54]Shlezinger N, Farsad N, Eldar YC, et al., 2021. Model-based machine learning for communications. https://arxiv.org/abs/2101.04726
[55]Stone M, 1974. Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc Ser B, 36(2):111-133.
[56]Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al., 2023. Large language models in medicine. Nat Med, 29(8):1930-1940.
[57]Touvron H, Martin L, Stone K, et al., 2023. LLAMA 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
[58]Üstün A, Aryabumi V, Yong ZX, et al., 2024. Aya model: an instruction finetuned open-access multilingual language model. Proc 62nd Annual Meeting of the Association for Computational Linguistics, p.15894-15939.
[59]Wang G, Cheng SJ, Zhan XY, et al., 2023. OpenChat: advancing open-source language models with mixed-quality data. https://arxiv.org/abs/2309.11235
[60]Wang YZ, Guo K, Hong W, et al., 2023. Split learning in wireless networks: a communication and computation adaptive scheme. IEEE/CIC Int Conf on Communications in China, p.1-6.
[61]Webb T, Holyoak KJ, Lu HJ, 2023. Emergent analogical reasoning in large language models. Nat Hum Behav, 7(9):1526-1541.
[62]Wei J, Bosma M, Zhao VY, et al., 2021. Finetuned language models are zero-shot learners. https://arxiv.org/abs/2109.01652
[63]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564
[64]Yang K, Shi CS, Shen C, et al., 2023. Offline reinforcement learning for wireless network optimization with mixture datasets. IEEE Trans Wirel Commun, 23(10):12703-12716.
[65]Zhang MJ, Cao JN, Shen XM, et al., 2024. EdgeShard: efficient LLM inference via collaborative edge computing. https://arxiv.org/abs/2405.14371
[66]Zhang XH, Yu BW, Yu HY, et al., 2023. Wider and deeper LLM networks are fairer LLM evaluators. https://arxiv.org/abs/2308.01862
[67]Zhu LW, Takami G, Kawahara M, et al., 2022. Alleviating parameter-tuning burden in reinforcement learning for large-scale process control. Comput Chem Eng, 158:107658.
Open peer comments: Debate/Discuss/Question/Opinion
<1>