Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems

Abstract: To ensure the reliability and availability of data, redundancy strategies are always required for distributed storage systems. Erasure coding, one of the representative redundancy strategies, has the advantage of low storage overhead, which facilitates its employment in distributed storage systems. Among the various erasure coding schemes, XOR-based erasure codes are becoming popular due to their high computing speed. When a single-node failure occurs in such coding schemes, a process called data recovery takes place to retrieve the failed node's lost data from surviving nodes. However, data transmission during the data recovery process usually requires a considerable amount of time. Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission, but it has encountered problems such as significant complexity and local optima. In this paper, we propose a random search recovery algorithm, named SA-RSR, to speed up single-node failure recovery of XOR-based erasure codes. SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data. In addition, this search process can be done in polynomial time. We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system, Ceph. Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0% and improves the performance of data recovery by up to 20.36% compared to the conventional recovery method.

Key words: Distributed storage system; Data reliability and availability; XOR-based erasure codes; Single-node failure; Data recovery

Chinese Summary  <24> SA-RSR:一ç§é€‚用于异或类纠删ç åˆ†å¸ƒå¼å­˜å‚¨ç³»ç»Ÿçš„æ•°æ®è¯»å–最优æ¢å¤æ–¹æ³•

张兴军1,æ¢å®é™1,刘云飞1,张长江1,æŽæ´‹2
1西安交通大学计算机科学与技术学院,中国西安市,710049
2北京电å­å·¥ç¨‹æ€»ä½“研究所,中国北京市,100854
摘è¦ï¼šå†—余策略ç»å¸¸è¢«ç”¨äºŽåˆ†å¸ƒå¼å­˜å‚¨ç³»ç»Ÿï¼Œä»¥ä¿è¯æ•°æ®çš„å¯é æ€§ä¸Žå¯ç”¨æ€§ã€‚纠删ç æ˜¯ä¸€ç§ä»£è¡¨æ€§çš„冗余策略,具有低存储开销优势,这ç§ä¼˜åŠ¿ä¿ƒè¿›äº†å®ƒåœ¨åˆ†å¸ƒå¼å­˜å‚¨ç³»ç»Ÿä¸­çš„应用。在å„ç§çº åˆ ç æœºåˆ¶ä¸­ï¼Œå¼‚或类纠删ç å‡­å€Ÿé«˜è®¡ç®—效率å˜å¾—越æ¥è¶Šæµè¡Œã€‚采用异或类纠删ç æœºåˆ¶çš„存储系统,如果å‘生å•èŠ‚点故障,便会进行数æ®æ¢å¤ï¼Œè¯¥è¿‡ç¨‹éœ€è¦ä»Žå¹¸å­˜èŠ‚点中下载数æ®ï¼Œç„¶åŽæ¢å¤æ•…障节点中的数æ®ã€‚然而,数æ®æ¢å¤è¿‡ç¨‹ä¸­çš„æ•°æ®ä¼ è¾“通常需è¦ç›¸å½“长时间。目å‰ç ”究主è¦é›†ä¸­åœ¨é€šè¿‡å‡å°‘æ•°æ®æ¢å¤è¿‡ç¨‹æ‰€éœ€æ•°æ®é‡ï¼Œå‡å°‘æ•°æ®ä¼ è¾“所需时间,但存在å¤æ‚度高和局部最优解等问题。本文æ出一ç§éšæœºæœç´¢æ¢å¤ç®—法,SA-RSR,该算法能加速异或类纠删ç å•èŠ‚点故障æ¢å¤ã€‚SA-RSR利用模拟退ç«æŠ€æœ¯å¯»æ‰¾è¯»å–和传输最少数æ®é‡çš„最优æ¢å¤æœºåˆ¶ï¼Œä¸”该æœç´¢è¿‡ç¨‹å¯åœ¨å¤šé¡¹å¼æ—¶é—´å†…完æˆã€‚最åŽï¼Œä¸ºéªŒè¯è¯¥æ–¹æ³•çš„有效性,使用多ç§å¼‚或类纠删ç è¿›è¡Œä»¿çœŸéªŒè¯ï¼Œå¹¶åœ¨çœŸå®žå­˜å‚¨ç³»ç»ŸCeph中验è¯ã€‚实验结果表明,与传统æ¢å¤æ–¹æ³•ç›¸æ¯”,SA-RSRå‡å°‘了30%çš„æ•°æ®è¯»å–与传输é‡ï¼Œæ高了20.36%çš„æ•°æ®æ¢å¤æ€§èƒ½ã€‚

关键è¯ç»„:分布å¼å­˜å‚¨ç³»ç»Ÿï¼›æ•°æ®å¯é æ€§ä¸Žå¯ç”¨æ€§ï¼›å¼‚或类纠删ç ï¼›å•èŠ‚点失效;数æ®æ¢å¤


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100242

CLC number:

TP391.4

Download Full Text:

Click Here

Downloaded:

5988

Download summary:

<Click Here> 

Downloaded:

329

Clicked:

5223

Cited:

0

On-line Access:

2022-06-17

Received:

2021-05-16

Revision Accepted:

2022-07-05

Crosschecked:

2021-08-16

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE