
CLC number: TP391
On-line Access: 2026-04-24
Received: 2025-10-28
Revision Accepted: 2026-04-24
Crosschecked: 2026-03-04
Cited: 0
Clicked: 7
Jiajia JIAO, Yixu YU. RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults[J]. Journal of Zhejiang University Science C, 2026, 27(4): 1-18.
@article{title="RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults",
author="Jiajia JIAO, Yixu YU",
journal="Journal of Zhejiang University Science C",
volume="27",
number="4",
pages="1-18",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0104"
}
%0 Journal Article
%T RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults
%A Jiajia JIAO
%A Yixu YU
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 4
%P 1-18
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0104
TY - JOUR
T1 - RetryTrigger: intelligent inference duplication for enhancing LLM resilience to hardware transient faults
A1 - Jiajia JIAO
A1 - Yixu YU
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 4
SP - 1
EP - 18
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0104
Abstract: large language models (LLMs) have exhibited outstanding performance across a wide range of natural language processing (NLP) tasks. However, the rising prevalence of hardware transient faults has made silent data corruptions (SDCs) in LLMs increasingly problematic, severely degrading output quality and user experience. State-of-the-art protection schemes primarily rely on hardware-assisted algorithm-based fault tolerance (ABFT) or boundary-setting-driven online fault tolerance (FT2) for selective layers, yet these solutions suffer from strict hardware dependencies, substantial overhead, or incomplete coverage. To address these limitations, we propose RetryTrigger, a novel hardware-free fault-aware inference methodology capable of handling all potential faults. During LLM inference, RetryTrigger dynamically collects runtime output features (e.g., maximum probability, top-k probability gaps, output entropy, logits statistics, and inference latency), which are used to train a LightGBM meta-model. This meta-model accurately predicts whether duplicate inference should be performed, thereby effectively mitigating faults while preserving efficiency without additional hardware dependence. Extensive experiments on seven representative LLMs (including T5-Small, RoBERTa, BioMedBERT, Qwen2.5-Coder-0.5B/7B, MiniMind, and Opt) demonstrate that RetryTrigger reduces SDC rates by up to 95.33% (on average 92.97%) and achieves a minimal performance overhead of 2.4012% (on average 4.1167%), offering a superior balance between reliability and efficiency compared to state-of-the-art solutions.
[1]Battaglini-Fischer S, Srinivasan N, Szarvas BL, et al., 2025. FAILS: a framework for automated collection and analysis of LLM service incidents. 16th ACM/SPEC Int Conf on Performance Engineering, p.187-194.
[2]Baumann RC, 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Dev Mater Reliab, 5(3):305-316.
[3]Cavagnero N, Dos Santos F, Ciccone M, et al., 2022. Transient-fault-aware design and training to enhance DNNs reliability with zero-overhead. IEEE 28th Int Symp on On-Line Testing and Robust System Design, p.1-7.
[4]Chen TQ, Guestrin C, 2016. XGBoost: a scalable tree boosting system. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.785-794.
[5]Chen ZT, Li GP, Pattabiraman K, 2021. A low-cost fault corrector for deep neural networks through range restriction. 51st Annual IEEE/IFIP Int Conf on Dependable Systems and Networks, p.1-13.
[6]Dai HL, Wu SX, Huang JJ, et al., 2025. FT-Transformer: resilient and reliable Transformer with end-to-end fault tolerant attention.
[7]Ghavami B, Sadati M, Fang ZM, et al., 2022. FitAct: error resilient deep neural networks via fine-grained post-trainable activation functions. Design, Automation & Test in Europe Conf & Exhibition, p.1239-1244.
[8]Hamming RW, 1950. Error detecting and error correcting codes. Bell Syst Tech J, 29(2):147-160.
[9]Hoang LH, Hanif MA, Shafique M, 2019. FT-ClipAct: resilience analysis of deep neural networks and improving their fault tolerance using clipped activation.
[10]Hosmer DW Jr, Lemeshow S, Sturdivant RX, 2013. Applied Logistic Regression. John Wiley & Sons, New York, USA.
[11]Jiang JY, Wang F, Shen JS, et al., 2024. A survey on large language models for code generation.
[12]Ke GL, Meng Q, Finley T, et al., 2017. LightGBM: a highly efficient gradient boosting decision tree. Proc 31st Int Conf on Neural Information Processing Systems, p.3149-3157.
[13]Li Y, Yang SL, Liu CC, et al., 2025. Resilio: an elastic fault-tolerant training system for large language models. J Comput Res Dev, 62(6):1380-1395 (in Chinese).
[14]Liang YH, Li XY, Ren J, et al., 2025. ATTNChecker: highly-optimized fault tolerant attention for large language model training. Proc 30th ACM SIGPLAN Annual Symp on Principles and Practice of Parallel Programming, p.252-266.
[15]Liu HX, Singh V, Filipiuk M, et al., 2025. ALBERTA: algorithm-based error resilience in transformer architectures. IEEE Open J Comput Soc, 6:85-96.
[16]Mousavi S, Ahmadilivani MH, Raik J, et al., 2024. ProAct: progressive training for hybrid clipped activation function to enhance resilience of DNNs.
[17]Radford A, Wu J, Child R, et al., 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.
[18]Roquet L, dos Santos FF, Rech P, et al., 2024. Cross-layer reliability evaluation and efficient hardening of large vision Transformers models. Proc 61st ACM/IEEE Design Automation Conf, Article 291.
[19]Sha QS, Paulitsch M, Pattabiraman K, et al., 2024. Global Clipper: enhancing safety and reliability of transformer-based object detection models.
[20]Sun Y, Zhu Z, Mulpuru C, et al., 2025. FT2: first-token-inspired online fault tolerance on critical layers for generative large language models. Proc 34th Int Symp on High-Performance Parallel and Distributed Computing, Article 7.
[21]Tan JWJ, Ping LQ, Wang QX, et al., 2023a. Saca-AVF: a quantitative approach to analyze the architectural vulnerability factors of CNN accelerators. IEEE Trans Comput, 72(11):3042-3056.
[22]Tan JWJ, Wang QX, Yan KG, et al., 2023b. Saca-FI: a microarchitecture-level fault injection framework for reliability analysis of systolic array based CNN accelerator. Future Gener Comput Syst, 147:251-264.
[23]Tan JWJ, Wang JS, Yan KG, et al., 2025a. Evaluating GPU’s instruction-level error characteristics under low supply voltages. IEEE Trans Comput, 74(2):555-568.
[24]Tan JWJ, Li XR, Zhong A, et al., 2025b. GEREM: fast and precise error resilience assessment for GPU microarchitectures. IEEE Trans Parall Distrib Syst, 36(5):1011-1024.
[25]Titopoulos V, Alexandridis K, Dimitrakopoulos G, 2025. Custom algorithm-based fault tolerance for attention layers in transformers.
[26]Venkatesha S, Parthasarathi R, 2024. Survey on redundancy based-fault tolerance methods for processors and hardware accelerators—trends in quantum computing, heterogeneous systems and reliability. ACM Comput Surv, 56(11):275.
[27]Wan BR, Han MJ, Sheng YY, et al., 2025. ByteCheckpoint: a unified checkpointing system for large foundation model development. 22nd USENIX Symp on Networked Systems Design and Implementation, p.559-578.
[28]Xie T, Zhao JW, Wan ZS, et al., 2025. ReaLM: reliable and efficient large language model inference with statistical algorithm-based fault tolerance.
[29]Xue XH, Liu C, Min F, et al., 2025. ApproxABFT: approximate algorithm-based fault tolerance for neural network processing.
[30]Zhang WX, Deng Y, Liu B, et al., 2023. Sentiment analysis in the era of large language models: a reality check.
[31]Zhou S, Xu ZD, Zhang M, et al., 2025. Large language models for disease diagnosis: a scoping review. https://arxiv.org/abs/2409.00097
Open peer comments: Debate/Discuss/Question/Opinion
<1>