ENGINEERING Information Technology & Electronic Engineering  2026 Vol.27 No.5 P.1-14

http://doi.org/10.1631/ENG.ITEE.2025.0156


HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias


Author(s):  Chenglong SUN, Yanqing ZHOU, Qi WANG, Yan ZHANG

Affiliation(s):  1. School of Computer and Information Engineering, Fuyang Normal University,Fuyang 236037,China more

Corresponding email(s):   wq_hfut@163.com

Key Words:  Three-dimensional network-on-chip (3D NoC), Through-silicon vias (TSVs), Redundancy, Fault-tolerant


Chenglong SUN, Yanqing ZHOU, Qi WANG, Yan ZHANG. HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias[J]. Journal of Zhejiang University Science C, 2026, 27(5): 1-14.

@article{title="HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias",
author="Chenglong SUN, Yanqing ZHOU, Qi WANG, Yan ZHANG",
journal="Journal of Zhejiang University Science C",
volume="27",
number="5",
pages="1-14",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/ENG.ITEE.2025.0156"
}

%0 Journal Article
%T HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias
%A Chenglong SUN
%A Yanqing ZHOU
%A Qi WANG
%A Yan ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 27
%N 5
%P 1-14
%@ 1869-1951
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/ENG.ITEE.2025.0156

TY - JOUR
T1 - HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias
A1 - Chenglong SUN
A1 - Yanqing ZHOU
A1 - Qi WANG
A1 - Yan ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 27
IS - 5
SP - 1
EP - 14
%@ 1869-1951
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/ENG.ITEE.2025.0156


Abstract: 
Three-dimensional network-on-chips (3D NoCs) are increasingly used to improve scalability in multicore systems. Through-silicon via (TSV) is a critical technology for enabling vertical interconnects between NoC layers. However, TSV-based interlayer connections are highly prone to faults resulting from manufacturing defects, aging, or other sources, which compromise system reliability. To address these challenges, particularly in chiplet-based 3D NoCs, robust fault-tolerant mechanisms are crucial for maintaining operational integrity in the presence of TSV faults. We introduce a novel fault-tolerant architecture designed to ensure persistent communication reliability despite permanent vertical link failures, named HyRAS, a hybrid redundancy- and serialization-based method. Our approach is built on two synergistic mechanisms. First, a lightweight spatial redundancy-based scheme leverages shared TSV resources to mitigate the impact of isolated faults. Second, for more severe fault scenarios, an adaptive serialization-based strategy is employed to maintain connectivity by efficiently using the remaining functional links. The architecture is rigorously evaluated through functional simulations using both synthetic traffic patterns and realistic application workloads. Compared to contemporary fault-tolerant methods, HyRAS achieves up to 28.2% higher throughput under realistic workloads with significant defect clusters. These gains are achieved with only modest overhead, incurring a 14.53% increase in area and 8.87% increase in power consumption relative to the standard redundancy-based router.

HyRAS:一种用于硅通孔的混合冗余与串行化容错架构

孙成龙1,2,周衍庆1,2,王奇3,张岩1,2
1阜阳师范大学计算机与信息工程学院,中国阜阳市,236037
2阜阳师范大学,安徽省智能计算与信创应用工程研究中心,中国阜阳市,236037
3合肥工业大学计算机与信息学院,中国合肥市,230601
摘要:三维片上网络(3D NoCs)在多核系统中的应用日益广泛,其核心价值在于显著提升系统的可扩展性。硅通孔(TSV)是实现片上网络层间垂直互连的关键技术。然而,基于硅通孔的层间互连极易因制造缺陷、器件老化及其他因素产生故障,进而严重影响系统可靠性。针对上述问题,尤其在芯粒架构的三维片上网络场景中,亟需构建高鲁棒性容错机制,以保障硅通孔故障状态下系统稳定运行。本文提出一种名为HyRAS的新型容错架构,该架构基于混合冗余与串行化方法,旨在当垂直链路发生永久性失效时,持续保障片上网络的通信可靠性。该架构融合两种协同工作的容错机制:首先,采用一种轻量级空间冗余策略,通过复用共享硅通孔资源,缓解单点独立故障带来的性能损耗;其次,面向大规模严重故障场景,引入自适应串行传输机制,高效调度剩余可用正常链路,维持网络全局连通性。结合合成流量模型与真实应用负载开展全功能仿真实验,对所提架构进行全面性能验证。实验结果表明,相较于现有主流容错方案,在存在大规模缺陷簇的真实负载环境下,HyRAS架构的网络吞吐量最高可提升28.2%;该架构硬件开销可控,相较于传统冗余路由架构,芯片面积开销仅增加14.53%,功耗开销提升8.87%。

关键词:三维片上网络;硅通孔;冗余;故障容错

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Agarwal S, Goel K, Sinha M, et al., 2025. Mitigation of phase transitions in self-organizing NoC for stable queueing dynamics. IEEE Trans Comput, 74(2):623-636.

[2]Akbari S, Shafiee A, Fathy M, et al., 2012. AFRA: a low cost high performance reliable routing for 3D mesh NoCs. Proc Design, Automation & Test in Europe Conf & Exhibition, p.332-337.

[3]Asadboland M, Mehranzadeh A, Mosleh M, 2025. CTWR: a congestion, temperature and wear-aware routing algorithm for partially-connected 3D network-on-chip. Comput Electr Eng, 124:110421.

[4]Bienia C, 2011. Benchmarking Modern Multiprocessors. PhD Dissemination, Princeton University, Princeton, USA.

[5]Binkert N, Beckmann B, Black G, et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2):1-7.

[6]Bose A, Ghosal P, 2020. A low latency energy efficient BFT based 3D NoC design with zone based routing strategy. J Syst Archit, 108:101738.

[7]Catania V, Mineo A, Monteleone S, et al., 2017. Cycle-accurate network on chip simulation with Noxim. ACM Trans Model Comput Simul, 27(1):4.

[8]Chen S, Xu Q, Yu B, 2019. Adaptive 3D-IC TSV fault tolerance structure generation. IEEE Trans Comput-Aided Des Integr Circ Syst, 38(5):949-960.

[9]Dang KN, Ahmed AB, Okuyama Y, et al., 2020a. Scalable design methodology and online algorithm for TSV-cluster defects recovery in highly reliable 3D-NoC systems. IEEE Trans Emerg Top Comput, 8(3):577-590.

[10]Dang KN, Ahmed AB, Abdallah AB, et al., 2020b. TSV-OCT: a scalable online multiple-TSV defects localization for real-time 3-D-IC systems. IEEE Trans Very Large Scale Integr Syst, 28(3):672-685.

[11]Dang KN, Ahmed AB, Abdallah AB, et al., 2022. HotCluster: a thermal-aware defect recovery method for through-silicon-vias toward reliable 3-D ICs systems. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(4):799-812.

[12]da Silva AA, Nogueira L, Coelho A, et al., 2025. Securet3d: an adaptive, secure, and fault-tolerant aware routing algorithm for vertically–partially connected 3D-NoC. IEEE Trans Very Large Scale Integr Syst, 33(1):275-287.

[13]Dubois F, Sheibanyrad A, Petrot F, et al., 2013. Elevator-first: a deadlock-free distributed routing algorithm for vertically partially connected 3D-NoCs. IEEE Trans Comput, 62(3):609-615.

[14]Flich J, Duato J, 2008. Logic-based distributed routing for NoCs. IEEE Comput Archit Lett, 7(1):13-16.

[15]Fu YX, Zhang C, Song WQ, et al., 2021. Optimizing vertical link placement and congestion aware dynamic elevator assignment for partially connected 3D-NoCs. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(10):1957-1970.

[16]Hou KH, Fan ZW, Zhang SF, et al., 2024. Dimension influence on the interface fatigue characteristics of three-dimensional TSV array: a fully coupled thermal-electrical-structural analysis. IEEE Trans Dev Mater Reliab, 24(4):571-583.

[17]Hsieh AC, Hwang T, 2012. TSV redundancy: architecture and design issues in 3-D IC. IEEE Trans Very Large Scale Integr Syst, 20(4):711-722.

[18]Jiang L, Xu Q, Eklow B, 2013. On effective through-silicon via repair for 3-D-stacked ICs. IEEE Trans Comput-Aided Des Integr Circ Syst, 32(4):559-571.

[19]Kang U, Chung HJ, Heo S, et al., 2010. 8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE J Sol-State Circ, 45(1):111-119.

[20]Khalil K, Eldash O, Kumar A, et al., 2021. Self-healing router approach for high-performance network-on-chip. IEEE Open J Circ Syst, 2:485-496.

[21]Khalil K, Kumar A, Bayoumi M, 2024. Dynamic fault tolerance approach for network-on-chip architecture. IEEE J Emerg Sel Top Circ Syst, 14(3):384-394.

[22]Kirtonia P, Williams S, Akter S, et al., 2026. A novel TSV model with fault characterization for high-frequency transmission in 3D ICs. IEEE Trans Circ Syst I Reg Pap, 73(3):1742-1755.

[23]Lee H, Shin SH, Yoo Y, et al., 2023. TRUST: through-silicon via repair using switch matrix topology. IEEE Trans Comput-Aided Des Integr Circ Syst, 42(7):2377-2390.

[24]Liu C, Chu C, Xu DW, et al., 2022. HyCA: a hybrid computing architecture for fault-tolerant deep learning. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(10):3400-3413.

[25]Lung CL, Chien JH, Shi YY, et al., 2011. TSV fault-tolerant mechanisms with application to 3D clock networks. Proc Int SoC Design Conf, p.127-130.

[26]Maity DK, Roy SK, Giri C, 2021. TSV-cluster defect tolerance using tree-based redundancy for yield improvement of 3-D ICs. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(8):1500-1510.

[27]Mercier R, Killian C, Kritikakou A, et al., 2022. BiSuT: a NoC-based bit-shuffling technique for multiple permanent faults mitigation. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(7):2276-2289.

[28]Ni TM, Liu DS, Xu Q, et al., 2020. Architecture of cobweb-based redundant TSV for clustered faults. IEEE Trans Very Large Scale Integr Syst, 28(7):1736-1739.

[29]Ni TM, Xu Q, Huang ZF, et al., 2021. A cost-effective TSV repair architecture for clustered faults in 3-D IC. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(9):1952-1956.

[30]Niazmand B, Azad SP, Flich J, et al., 2016. Logic-based implementation of fault-tolerant routing in 3D network-on-chips. Proc 10th IEEE/ACM Int Symp on Networks-on-Chip, p.1-8.

[31]Ouyang YM, Zhang TB, Li JH, et al., 2024. Fault-tolerant routing for reliable packet transmission in on-chip networks. Microelectr J, 153:106425.

[32]Papaphilippou P, Van Chu T, 2024. Efficient deadlock avoidance for 2-D mesh NoCs that use OQ or VOQ routers. IEEE Trans Comput, 73(5):1414-1426.

[33]Reddy RP, Acharyya A, Khursheed S, 2017. A cost-effective fault tolerance technique for functional TSV in 3-D ICs. IEEE Trans Very Large Scale Integr Syst, 25(7):2071-2080.

[34]Song RH, Zhang JQ, Zhu ZQ, et al., 2024. Fault and self-repair for high reliability in die-to-die interconnection of 2.5D/3D IC. Microelectr Reliab, 158:115429.

[35]Taheri E, Kim RG, Nikdast M, 2023. AdEle+: an adaptive congestion-and-energy-aware elevator selection for partially connected 3D networks-on-chip. IEEE Trans Comput, 72(8):2278-2292.

[36]Wang SC, Chakrabarty K, Tahoori MB, 2019. Defect clustering-aware spare-TSV allocation in 3-D ICs for yield enhancement. IEEE Trans Comput-Aided Des Integr Circ Syst, 38(10):1928-1941.

[37]Wei C, Cui XL, Cui XX, 2024. Dy-MFNS-CAC: an encoding mechanism to suppress the crosstalk and repair the hard faults in rectangular TSV arrays. IEEE Trans Reliab, 73(1):622-636.

[38]Xiong RT, Ren W, Zhang CZ, et al., 2025. A sampling-based acceleration method for heterogeneous chiplet NoC simulations. Fut Gener Comput Syst, 166:107643.

[39]Xu Q, Geng H, Ni TM, et al., 2022. Fortune: a new fault-tolerance TSV configuration in router-based redundancy structure. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(10):3182-3187.

[40]Zhang Y, Jing ZW, Yang QH, et al., 2025. A survey on routing algorithm and router microarchitecture of three-dimensional network-on-chip. J Syst Archit, 164:103429.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Full Text:   <280>

CLC number: TN47

On-line Access: 2026-05-27

Received: 2025-11-26

Revision Accepted: 2026-04-15

Crosschecked: 2026-05-27

Cited: 0

Clicked: 342

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Chenglong SUN

0000-0002-7492-0542

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE