Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Adaptive layer splitting for wireless large language model inference in edge computing: a model-based reinforcement learning approach

Abstract: Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. In the path toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. Accordingly, this study introduces a framework taking inspiration from model-based reinforcement learning to determine the optimal splitting point across the edge and user equipment. By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

Key words: Large language models (LLMs); Edge computing; Model-based reinforcement learning (MBRL); Split inference; Transformer

Chinese Summary  <2> 基于模型强化学习的边缘计算无线大语言模型推理自适应层切分方法

陈宇轩1,李荣鹏1,于小雪1,赵志峰2,张宏纲1
1浙江大学信息与电子工程学院,中国杭州市,310027
2之江实验室,中国杭州市,310012
摘要:在边缘计算环境中优化大型语言模型(LLMs)的部署对提升隐私保护和计算效率至关重要。为实现高效的无线LLM推理,本文全面分析了主流开源LLMs中不同分割点的影响。本文引入一个基于模型的强化学习(MBRL)框架,以确定边缘和用户设备(UE)之间的最佳分割点。通过引入奖励替代模型,该方法显著减少了频繁的性能评估的计算成本。广泛的仿真结果表明,该方法在不同网络条件下有效地平衡了推理性能和计算负载,为去中心化环境中LLM的部署提供稳健的解决方案。

关键词组:大型语言模型;边缘计算;基于模型的强化学习;分裂推理;Transformer模型


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2400468

CLC number:

TP391

Download Full Text:

Click Here

Downloaded:

414

Download summary:

<Click Here> 

Downloaded:

46

Clicked:

751

Cited:

0

On-line Access:

2025-03-07

Received:

2024-06-01

Revision Accepted:

2024-09-13

Crosschecked:

2025-03-07

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE