CLC number: TP333
On-line Access: 2022-07-21
Received: 2021-05-26
Revision Accepted: 2022-07-21
Crosschecked: 2021-10-07
Cited: 0
Clicked: 4155
Citations: Bibtex RefMan EndNote GB/T7714
Sutapa SARKAR, Biplab Kumar SIKDAR, Mousumi SAHA. Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(7): 1110-1126.
@article{title="Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory",
author="Sutapa SARKAR, Biplab Kumar SIKDAR, Mousumi SAHA",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="7",
pages="1110-1126",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100255"
}
%0 Journal Article
%T Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory
%A Sutapa SARKAR
%A Biplab Kumar SIKDAR
%A Mousumi SAHA
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 7
%P 1110-1126
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100255
TY - JOUR
T1 - Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory
A1 - Sutapa SARKAR
A1 - Biplab Kumar SIKDAR
A1 - Mousumi SAHA
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 7
SP - 1110
EP - 1126
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100255
Abstract: This paper presents a group-based dynamic stuck-at fault diagnosis scheme intended for resistive random-access memory (ReRAM). Traditional static random-access memory, dynamic random-access memory, NAND, and NOR flash memory are limited by their scalability, power, package density, and so forth. Next-generation memory types like ReRAMs are considered to have various advantages such as high package density, non-volatility, scalability, and low power consumption, but cell reliability has been a problem. Unreliable memory operation is caused by permanent stuck-at faults due to extensive use of write- or memory-intensive workloads. An increased number of stuck-at faults also prematurely limit chip lifetime. Therefore, a cellular automaton (CA) based dynamic stuck-at fault-tolerant design is proposed here to combat unreliable cell functioning and variable cell lifetime issues. A scalable, block-level fault diagnosis and recovery scheme is introduced to ensure readable data despite multi-bit stuck-at faults. The scheme is a novel approach because its goal is to remove all the restrictions on the number and nature of stuck-at faults in general fault conditions. The proposed scheme is based on Wolfram's null boundary and periodic boundary CA theory. Various special classes of CAs are introduced for 100% fault tolerance: single-length-cycle single-attractor cellular automata (SACAs), single-length-cycle two-attractor cellular automata (TACAs), and single-length-cycle multiple-attractor cellular automata (MACAs). The target micro-architectural unit is designed with optimal space overhead.
[1]Dalui M, Sikdar BK, 2017. A cellular automata based self-correcting protocol processor for scalable CMPs. Microelectron J, 62:108-119.
[2]Das S, Naskar NN, Mukherjee S, et al., 2010. Characterization of CA rules for SACA targeting detection of faulty nodes in WSN. Proc 9th Int Conf on Cellular Automata for Research and Industry, p.300-311.
[3]Fan J, Jiang S, Shu JW, et al., 2013. Aegis: partitioning data block for efficient recovery of stuck-at-faults in phase change memory. Proc 46th Annual IEEE/ACM Int Symp on Microarchitecture, p.433-444.
[4]Hamming RW, 1950. Error detecting and error correcting codes. Bell Syst Techn J, 29(2):147-160.
[5]Ipek E, Condit J, Nightingale EB, et al., 2010. Dynamically replicated memory: building reliable systems from nanoscale resistive memories. ACM SIGARCH Comput Arch News, 38(1):3-14.
[6]Kang S, Cho WY, Cho BH, et al., 2007. A 0.1-μm 1.8-V 256-MB phase-change random access memory (PRAM) with 66-MHz synchronous burst-read operation. IEEE J Sol-State Circ, 42(1):210-218.
[7]Lin IC, Chiou JN, 2015. High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policies. IEEE Trans Very Large Scale Integr (VLSI) Syst, 23(10):2149-2161.
[8]Melhem R, Maddah R, Cho S, 2012. RDIS: a recursively defined invertible set scheme to tolerate multiple stuck-at faults in resistive memory. Proc IEEE/IFIP Int Conf on Dependable Systems and Networks, p.1-12.
[9]Qureshi MK, Karidis J, Franceschini M, et al., 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. Proc 42nd Annual IEEE/ACM Int Symp on Microarchitecture, p.14-23.
[10]Radojković P, Carpenter PM, Moretó M, et al., 2016. Thread assignment in multicore/multithreaded processors: a statistical approach. IEEE Trans Comput, 65(1):256-269.
[11]Saha M, Sarkar S, Sikdar BK, 2016. Cellular automata based fault tolerant resistive memory design. Proc 6th Int Symp on Embedded Computing and System Design, p.176-180.
[12]Sarkar S, 2018. Multi-bit stuck-at fault recovery system with error correction pointer. Proc 3rd Int Conf on Communication and Electronics Systems, p.528-533.
[13]Sarkar S, Saha M, Sikdar BK, 2017. Multi-bit fault tolerant design for resistive memories through dynamic partitioning. Proc IEEE East-West Design & Test Symp, p.1-6.
[14]Sarkar S, Ghosh M, Sikdar BK, et al., 2020. Periodic boundary cellular automata based wear leveling for resistive memory. IAENG Int J Comput Sci, 47(2):310-321.
[15]Schechter S, Loh GH, Strauss K, et al., 2010. Use ECP, not ECC, for hard failures in resistive memories. ACM SIGARCH Comput Arch News, 38(3):141-152.
[16]Seong NH, Woo DH, Srinivasan V, et al., 2010. SAFER: stuck-at-fault error recovery for memories. Proc 43rd Annual IEEE/ACM Int Symp on Microarchitecture, p.115-124.
[17]Strukov D, 2006. The area and latency tradeoffs of binary bit-parallel BCH decoders for prospective nanoelectronic memories. Proc 40th Asilomar Conf on Signals, Systems and Computers, p.1183-1187.
[18]Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. Proc 36th Annual Int Symp on Computer Architecture, p.14-23.
Open peer comments: Debate/Discuss/Question/Opinion
<1>