Jiajia JIAO, Ran WEN, Hong YANG. An end-to-end automatic methodology to accelerate the accuracy evaluation of DNN under hardware transient faults[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400547
@article{title="An end-to-end automatic methodology to accelerate the accuracy evaluation of DNN under hardware transient faults", author="Jiajia JIAO, Ran WEN, Hong YANG", journal="Frontiers of Information Technology & Electronic Engineering", year="in press", publisher="Zhejiang University Press & Springer", doi="https://doi.org/10.1631/FITEE.2400547" }
%0 Journal Article %T An end-to-end automatic methodology to accelerate the accuracy evaluation of DNN under hardware transient faults %A Jiajia JIAO %A Ran WEN %A Hong YANG %J Frontiers of Information Technology & Electronic Engineering %P %@ 2095-9184 %D in press %I Zhejiang University Press & Springer doi="https://doi.org/10.1631/FITEE.2400547"
TY - JOUR T1 - An end-to-end automatic methodology to accelerate the accuracy evaluation of DNN under hardware transient faults A1 - Jiajia JIAO A1 - Ran WEN A1 - Hong YANG J0 - Frontiers of Information Technology & Electronic Engineering SP - EP - %@ 2095-9184 Y1 - in press PB - Zhejiang University Press & Springer ER - doi="https://doi.org/10.1631/FITEE.2400547"
Abstract: Hardware transient faults are proven to have a significant impact on deep neural networks (DNNs), whose safety-critical-misclassification probabilities in autonomous vehicles, healthcare, and space applications are increased up to 4x. However, the inaccuracy evaluation using accurate fault injection is time-consuming and requires several hours and even a couple of days on a complete simulation platform. To accelerate the evaluation of hardware transient faults on DNNs, we design a unified and end-to-end automatic methodology, A-Mean, to take advantage of the silent data corruption (SDC) rates of basic operations, such as convolution, add, multiply, Relu, Maxpooling, etc., and a two-level mean mechanism to rapidly compute the overall SDC rate for estimating the general classification metric, accuracy and application-specific metric safety-critical-misclassification (SCM). More importantly, a max policy is used to determine the SDC boundary of non-sequential structures in DNNs. Then, the worst-case scheme is also used to further calculate the enlarged SCM and halved accuracy under transient faults via merging the static results of SDC with the original data from one-time dynamic fault-free execution. Furthermore, all of the steps mentioned above have been implemented automatically so that this easy-to-use automatic tool can be employed for the prompt evaluation of transient faults on diverse DNNs. Meanwhile, a novel metric fault sensitivity is defined to jointly characterize the variation of transient fault-induced higher SCM and lower accuracy. The comparative results with a state-of-the-art fault injection method on five DNN models and four datasets show that our proposed estimation method A-Mean achieves up to 922.80x speedup, with just 4.20% SCM loss and 0.77% accuracy loss on average.
Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference
Open peer comments: Debate/Discuss/Question/Opinion
Open peer comments: Debate/Discuss/Question/Opinion
<1>