ENGINEERING Information Technology & Electronic Engineering  2026 Vol.27 No.4 P.1-18

http://doi.org/10.1631/ENG.ITEE.2025.0104


RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults


Author(s):  Jiajia JIAO, Yixu YU

Affiliation(s):  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

Corresponding email(s):   jiaojiajia@shmtu.edu.cn

Key Words:  Large language models, System resilience, Intelligent fault detection, Inference duplication, Transient faults


Share this article to: More <<< Previous Article|

Jiajia JIAO, Yixu YU. RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults[J]. Journal of Zhejiang University Science C, 2026, 27(4): 1-18.

@article{title="RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults",
author="Jiajia JIAO, Yixu YU",
journal="Journal of Zhejiang University Science C",
volume="27",
number="4",
pages="1-18",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0104"
}

%0 Journal Article
%T RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults
%A Jiajia JIAO
%A Yixu YU
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 4
%P 1-18
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0104

TY - JOUR
T1 - RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults
A1 - Jiajia JIAO
A1 - Yixu YU
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 4
SP - 1
EP - 18
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0104


Abstract: 
large language models (LLMs) have exhibited outstanding performance across a wide range of natural language processing (NLP) tasks. However, the rising prevalence of hardware transient faults has made silent data corruptions (SDCs) in LLMs increasingly problematic, severely degrading output quality and user experience. State-of-the-art protection schemes primarily rely on hardware-assisted algorithm-based fault tolerance (ABFT) or boundary-setting-driven online fault tolerance (FT2) for selective layers, yet these solutions suffer from strict hardware dependencies, substantial overhead, or incomplete coverage. To address these limitations, we propose RetryTrigger, a novel hardware-free fault-aware inference methodology capable of handling all potential faults. During LLM inference, RetryTrigger dynamically collects runtime output features (e.g., maximum probability, top-k probability gaps, output entropy, logits statistics, and inference latency), which are used to train a LightGBM meta-model. This meta-model accurately predicts whether duplicate inference should be performed, thereby effectively mitigating faults while preserving efficiency without additional hardware dependence. Extensive experiments on seven representative LLMs (including T5-Small, RoBERTa, BioMedBERT, Qwen2.5-Coder-0.5B/7B, MiniMind, and Opt) demonstrate that RetryTrigger reduces SDC rates by up to 95.33% (on average 92.97%) and achieves a minimal performance overhead of 2.4012% (on average 4.1167%), offering a superior balance between reliability and efficiency compared to state-of-the-art solutions.

RetryTrigger:面向硬件瞬态故障的大语言模型智能推理重试容错方法

焦佳佳,于亦许
上海海事大学信息工程学院,中国上海市,201306
摘要:近年来,大语言模型在各类自然语言处理任务中展现卓越性能。然而,随着硬件瞬态故障的日益频发,大语言模型中的静默数据损坏问题愈发突出,极大损害了输出质量与用户体验。现有主流的保护方案主要依赖于硬件辅助的算法级容错,或针对模型部分层的边界驱动在线容错方法。但是,这些方法往往存在硬件依赖严苛、性能开销巨大或故障覆盖不全面等问题。为克服这些局限性,本文提出一种新颖的无需额外硬件支持的故障感知推理方法RetryTrigger,以全面应对各类瞬态故障。在大语言模型推理过程中,RetryTrigger会动态收集运行时输出特征(如最大概率、top-k概率差、输出熵、logits统计信息以及推理延迟),并将这些特征输入至LightGBM元模型进行判定。该元模型能够准确预测是否需要触发推理重试,从而在不依赖额外硬件的前提下,实现推理效率与故障缓解的双重保障。本文在7种具有代表性的大语言模型(含T5-Small、RoBERTa、BioMedBERT、Qwen2.5-Coder-0.5B/7B、MiniMind和Opt)上进行了大量实验,结果表明,RetryTrigger最多可将静默数据损坏率降低95.33%(平均降低92.97%),同时实现最低2.4012%(平均4.1167%)的性能开销。与现有前沿解决方案相比,该方法在可靠性与推理效率之间实现了更优的权衡。

关键词:大语言模型;系统弹性;智能故障检测;推理复算;瞬态故障

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Battaglini-Fischer S, Srinivasan N, Szarvas BL, et al., 2025. FAILS: a framework for automated collection and analysis of LLM service incidents. 16th ACM/SPEC Int Conf on Performance Engineering, p.187-194.

[2]Baumann RC, 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Dev Mater Reliab, 5(3):305-316.

[3]Cavagnero N, Dos Santos F, Ciccone M, et al., 2022. Transient-fault-aware design and training to enhance DNNs reliability with zero-overhead. IEEE 28th Int Symp on On-Line Testing and Robust System Design, p.1-7.

[4]Chen TQ, Guestrin C, 2016. XGBoost: a scalable tree boosting system. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.785-794.

[5]Chen ZT, Li GP, Pattabiraman K, 2021. A low-cost fault corrector for deep neural networks through range restriction. 51st Annual IEEE/IFIP Int Conf on Dependable Systems and Networks, p.1-13.

[6]Dai HL, Wu SX, Huang JJ, et al., 2025. FT-Transformer: resilient and reliable Transformer with end-to-end fault tolerant attention.

[7]Ghavami B, Sadati M, Fang ZM, et al., 2022. FitAct: error resilient deep neural networks via fine-grained post-trainable activation functions. Design, Automation & Test in Europe Conf & Exhibition, p.1239-1244.

[8]Hamming RW, 1950. Error detecting and error correcting codes. Bell Syst Tech J, 29(2):147-160.

[9]Hoang LH, Hanif MA, Shafique M, 2019. FT-ClipAct: resilience analysis of deep neural networks and improving their fault tolerance using clipped activation.

[10]Hosmer DW Jr, Lemeshow S, Sturdivant RX, 2013. Applied Logistic Regression. John Wiley & Sons, New York, USA.

[11]Jiang JY, Wang F, Shen JS, et al., 2024. A survey on large language models for code generation.

[12]Ke GL, Meng Q, Finley T, et al., 2017. LightGBM: a highly efficient gradient boosting decision tree. Proc 31st Int Conf on Neural Information Processing Systems, p.3149-3157.

[13]Li Y, Yang SL, Liu CC, et al., 2025. Resilio: an elastic fault-tolerant training system for large language models. J Comput Res Dev, 62(6):1380-1395 (in Chinese).

[14]Liang YH, Li XY, Ren J, et al., 2025. ATTNChecker: highly-optimized fault tolerant attention for large language model training. Proc 30th ACM SIGPLAN Annual Symp on Principles and Practice of Parallel Programming, p.252-266.

[15]Liu HX, Singh V, Filipiuk M, et al., 2025. ALBERTA: algorithm-based error resilience in transformer architectures. IEEE Open J Comput Soc, 6:85-96.

[16]Mousavi S, Ahmadilivani MH, Raik J, et al., 2024. ProAct: progressive training for hybrid clipped activation function to enhance resilience of DNNs.

[17]Radford A, Wu J, Child R, et al., 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.

[18]Roquet L, dos Santos FF, Rech P, et al., 2024. Cross-layer reliability evaluation and efficient hardening of large vision Transformers models. Proc 61st ACM/IEEE Design Automation Conf, Article 291.

[19]Sha QS, Paulitsch M, Pattabiraman K, et al., 2024. Global Clipper: enhancing safety and reliability of transformer-based object detection models.

[20]Sun Y, Zhu Z, Mulpuru C, et al., 2025. FT2: first-token-inspired online fault tolerance on critical layers for generative large language models. Proc 34th Int Symp on High-Performance Parallel and Distributed Computing, Article 7.

[21]Tan JWJ, Ping LQ, Wang QX, et al., 2023a. Saca-AVF: a quantitative approach to analyze the architectural vulnerability factors of CNN accelerators. IEEE Trans Comput, 72(11):3042-3056.

[22]Tan JWJ, Wang QX, Yan KG, et al., 2023b. Saca-FI: a microarchitecture-level fault injection framework for reliability analysis of systolic array based CNN accelerator. Future Gener Comput Syst, 147:251-264.

[23]Tan JWJ, Wang JS, Yan KG, et al., 2025a. Evaluating GPU’s instruction-level error characteristics under low supply voltages. IEEE Trans Comput, 74(2):555-568.

[24]Tan JWJ, Li XR, Zhong A, et al., 2025b. GEREM: fast and precise error resilience assessment for GPU microarchitectures. IEEE Trans Parall Distrib Syst, 36(5):1011-1024.

[25]Titopoulos V, Alexandridis K, Dimitrakopoulos G, 2025. Custom algorithm-based fault tolerance for attention layers in transformers.

[26]Venkatesha S, Parthasarathi R, 2024. Survey on redundancy based-fault tolerance methods for processors and hardware accelerators—trends in quantum computing, heterogeneous systems and reliability. ACM Comput Surv, 56(11):275.

[27]Wan BR, Han MJ, Sheng YY, et al., 2025. ByteCheckpoint: a unified checkpointing system for large foundation model development. 22nd USENIX Symp on Networked Systems Design and Implementation, p.559-578.

[28]Xie T, Zhao JW, Wan ZS, et al., 2025. ReaLM: reliable and efficient large language model inference with statistical algorithm-based fault tolerance.

[29]Xue XH, Liu C, Min F, et al., 2025. ApproxABFT: approximate algorithm-based fault tolerance for neural network processing.

[30]Zhang WX, Deng Y, Liu B, et al., 2023. Sentiment analysis in the era of large language models: a reality check.

[31]Zhou S, Xu ZD, Zhang M, et al., 2025. Large language models for disease diagnosis: a scoping review. https://arxiv.org/abs/2409.00097

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Full Text:   <78>

Summary:  <68>

CLC number: TP391

On-line Access: 2026-04-24

Received: 2025-10-28

Revision Accepted: 2026-04-24

Crosschecked: 2026-03-04

Cited: 0

Clicked: 167

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Jiajia JIAO

0000-0003-3680-787X

Yixu YU

0009-0003-3257-1393

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE