|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2022 Vol.23 No.6 P.858-875
SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems
Abstract: To ensure the reliability and availability of data, redundancy strategies are always required for distributed storage systems. Erasure coding, one of the representative redundancy strategies, has the advantage of low storage overhead, which facilitates its employment in distributed storage systems. Among the various erasure coding schemes, XOR-based erasure codes are becoming popular due to their high computing speed. When a single-node failure occurs in such coding schemes, a process called data recovery takes place to retrieve the failed node's lost data from surviving nodes. However, data transmission during the data recovery process usually requires a considerable amount of time. Current research has focused mainly on reducing the amount of data needed for data recovery to reduce the time required for data transmission, but it has encountered problems such as significant complexity and local optima. In this paper, we propose a random search recovery algorithm, named SA-RSR, to speed up single-node failure recovery of XOR-based erasure codes. SA-RSR uses a simulated annealing technique to search for an optimal recovery solution that reads and transmits a minimum amount of data. In addition, this search process can be done in polynomial time. We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations and in a real storage system, Ceph. Experimental results in Ceph show that SA-RSR reduces the amount of data required for recovery by up to 30.0% and improves the performance of data recovery by up to 20.36% compared to the conventional recovery method.
Key words: Distributed storage system; Data reliability and availability; XOR-based erasure codes; Single-node failure; Data recovery
1西安交通大å¦è®¡ç®—机科å¦ä¸ŽæŠ€æœ¯å¦é™¢ï¼Œä¸å›½è¥¿å®‰å¸‚,710049
2北京电åå·¥ç¨‹æ€»ä½“ç ”ç©¶æ‰€ï¼Œä¸å›½åŒ—京市,100854
摘è¦ï¼šå†—ä½™ç–ç•¥ç»å¸¸è¢«ç”¨äºŽåˆ†å¸ƒå¼å˜å‚¨ç³»ç»Ÿï¼Œä»¥ä¿è¯æ•°æ®çš„å¯é 性与å¯ç”¨æ€§ã€‚çº åˆ ç 是一ç§ä»£è¡¨æ€§çš„冗余ç–略,具有低å˜å‚¨å¼€é”€ä¼˜åŠ¿ï¼Œè¿™ç§ä¼˜åŠ¿ä¿ƒè¿›äº†å®ƒåœ¨åˆ†å¸ƒå¼å˜å‚¨ç³»ç»Ÿä¸çš„应用。在å„ç§çº åˆ ç 机制ä¸ï¼Œå¼‚æˆ–ç±»çº åˆ ç å‡å€Ÿé«˜è®¡ç®—效率å˜å¾—越æ¥è¶Šæµè¡Œã€‚é‡‡ç”¨å¼‚æˆ–ç±»çº åˆ ç 机制的å˜å‚¨ç³»ç»Ÿï¼Œå¦‚æžœå‘生å•èŠ‚点故障,便会进行数æ®æ¢å¤ï¼Œè¯¥è¿‡ç¨‹éœ€è¦ä»Žå¹¸å˜èŠ‚点ä¸ä¸‹è½½æ•°æ®ï¼Œç„¶åŽæ¢å¤æ•…障节点ä¸çš„æ•°æ®ã€‚然而,数æ®æ¢å¤è¿‡ç¨‹ä¸çš„æ•°æ®ä¼ 输通常需è¦ç›¸å½“长时间。目å‰ç ”究主è¦é›†ä¸åœ¨é€šè¿‡å‡å°‘æ•°æ®æ¢å¤è¿‡ç¨‹æ‰€éœ€æ•°æ®é‡ï¼Œå‡å°‘æ•°æ®ä¼ 输所需时间,但å˜åœ¨å¤æ‚度高和局部最优解ç‰é—®é¢˜ã€‚本文æ出一ç§éšæœºæœç´¢æ¢å¤ç®—法,SA-RSRï¼Œè¯¥ç®—æ³•èƒ½åŠ é€Ÿå¼‚æˆ–ç±»çº åˆ ç å•èŠ‚点故障æ¢å¤ã€‚SA-RSR利用模拟退ç«æŠ€æœ¯å¯»æ‰¾è¯»å–å’Œä¼ è¾“æœ€å°‘æ•°æ®é‡çš„最优æ¢å¤æœºåˆ¶ï¼Œä¸”该æœç´¢è¿‡ç¨‹å¯åœ¨å¤šé¡¹å¼æ—¶é—´å†…完æˆã€‚最åŽï¼Œä¸ºéªŒè¯è¯¥æ–¹æ³•çš„有效性,使用多ç§å¼‚æˆ–ç±»çº åˆ ç 进行仿真验è¯ï¼Œå¹¶åœ¨çœŸå®žå˜å‚¨ç³»ç»ŸCephä¸éªŒè¯ã€‚å®žéªŒç»“æžœè¡¨æ˜Žï¼Œä¸Žä¼ ç»Ÿæ¢å¤æ–¹æ³•ç›¸æ¯”,SA-RSRå‡å°‘了30%çš„æ•°æ®è¯»å–ä¸Žä¼ è¾“é‡ï¼Œæ高了20.36%çš„æ•°æ®æ¢å¤æ€§èƒ½ã€‚
关键è¯ç»„:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2100242
CLC number:
TP391.4
Download Full Text:
Downloaded:
5988
Download summary:
<Click Here>Downloaded:
329Clicked:
5223
Cited:
0
On-line Access:
2022-06-17
Received:
2021-05-16
Revision Accepted:
2022-07-05
Crosschecked:
2021-08-16