CLC number: TP316
On-line Access: 2018-04-09
Received: 2016-08-16
Revision Accepted: 2016-11-07
Crosschecked: 2018-02-15
Cited: 0
Clicked: 6762
Wen-zhe Zhang, Kai Lu, Xiao-ping Wang. Versionized process based on non-volatile random-access memory for fine-grained fault tolerance[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(2): 192-205.
@article{title="Versionized process based on non-volatile random-access memory for fine-grained fault tolerance",
author="Wen-zhe Zhang, Kai Lu, Xiao-ping Wang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="2",
pages="192-205",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601477"
}
%0 Journal Article
%T Versionized process based on non-volatile random-access memory for fine-grained fault tolerance
%A Wen-zhe Zhang
%A Kai Lu
%A Xiao-ping Wang
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 2
%P 192-205
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601477
TY - JOUR
T1 - Versionized process based on non-volatile random-access memory for fine-grained fault tolerance
A1 - Wen-zhe Zhang
A1 - Kai Lu
A1 - Xiao-ping Wang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 2
SP - 192
EP - 205
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601477
Abstract: Non-volatile random-access memory (NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process (VerP), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system crash. Compared with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.
[1]Adiga NR, Almasi G, Bright AA, et al., 2002. An overview of the Bluegene/L supercomputer. Proc ACM/IEEE Conf on Supercomputing, p.60.
[2]Badam A, 2013. How persistent memory will change software systems. Computer, 46(8):45-51.
[3]Bailey K, Ceze L, Gribble SD, et al., 2011. Operating system implications of fast, cheap, non-volatile memory. Proc 13th Usenix Conf on Hot Topics in Operating Systems, p.2.
[4]Coburn J, Caulfield AM, Akel A, et al., 2011. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. ACM SIGARCH Comput Archit News, 39(1):105-118.
[5]D’Amorim M, Rosu G, 2005. An equational specification for the scheme language. J Univ Comput, 11(7):1327-1348.
[6]Dong X, Xie Y, Muralimanohar N, et al., 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale system. ACM Trans Archit Code Optim, 8(2), Article 6.
[7]Dulloor SR, Kumar S, Keshavamurthy A, et al., 2014. System software for persistent memory. Proc 9th European Conf on Computer Systems, p.15.
[8]Guerraoui R, Trigonakis V, 2016. Optimistic concurrency with OPTIK. ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.197-211.
[9]Kannan S, Gavrilovska A, Schwan K, et al., 2013. Optimizing checkpoints using NVM as virtual memory. IEEE 27th Int Symp on Parallel & Distributed Processing, p.29-40.
[10]Larkin J, Fahey M, 2007. Guidelines for efficient parallel I/O on the cray XT3/XT4. Proc Cray User Group.
[11]Liang S, Bracha G, 2000. Dynamic class loading in the Java virtual machine. ACM SIGPLAN Not, 33(10):36-44.
[12]Liang Y, Zhang Y, Sivasubramaniam A, et al., 2006. Bluegene/L failure analysis and prediction models. Int Conf on Dependable Systems and Networks, p.425-434.
[13]Liang Y, Zhang Y, Xiong H, et al., 2007. Failure prediction in IBM Bluegene/L event logs. 7th IEEE Int Conf on Data Mining, p.583-588.
[14]Lu X, Wang H, Wang J, et al., 2013. Internet-based virtual computing environment: beyond the data center as a computer. Fut Gener Comput Syst, 29(1):309-322.
[15]Luk CK, Cohn R, Muth R, et al., 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Conf on Programming Language Design and Implementation, p.190-200.
[16]Oliphant TE, 2007. Python for scientific computing. Comput Sci Eng, 9(3):10-20.
[17]Qureshi MK, Franceschini MM, Jagmohan A, et al., 2012. PreSET: improving performance of phase change memories by exploiting asymmetry in write times. 39th Annual Int Symp on Computer Architecture, p.380-391.
[18]Rhodes C, Costanza P, D’Hondt T, et al., 2007. Lisp. Conf on Object-Oriented Technology, p.1-6.
[19]Surhone LM, Timpledon M, Marseken SF, et al., 2010. TinyScheme. Betascript Publishing.
[20]Uhlig R, Neiger G, Rodger D, et al., 2005. Intel virtualization technology. Computer, 38(5):48-56.
[21]Vallée-Rai R, Gagnon E, Hendren L, et al., 2000. Optimizing Java bytecode using the soot framework: is it feasible? Int Conf on Compiler Construction, p.18-34.
[22]Venkataraman S, Tolia N, Ranganathan P, et al., 2011. Consistent and durable data structures for non-volatile byte-addressable memory. Usenix Conf on File and Stroage Technologies, p.61-75.
[23]Volos H, Tack AJ, Swift MM, 2011. Mnemosyne: lightweight persistent memory. ACM SIGARCH Comput Archit News, 39(1):91-104.
[24]Volos H, Nalli S, Panneerselvam S, et al., 2014. Aerie: flexible file-system interfaces to storage-class memory. Proc 9th European Conf on Computer Systems, p.1-14.
[25]Wong HSP, Raoux S, Kim SB, et al., 2010. Phase change memory. Proc IEEE, 98(12):2201-2227.
[26]Yang X, Wang Z, Xue J, et al., 2012. The reliability wall for exascale supercomputing. IEEE Trans Comput, 61(6):767-779.
[27]Zhang WZ, Kai L, Luján M, et al., 2017. Fine-grained checkpoint based on non-volatile memory. Front Inform Technol Electron Eng, 18(2):220-234.
[28]Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Comput Archit News, 37(3):14-23.
Open peer comments: Debate/Discuss/Question/Opinion
<1>