CLC number:
On-line Access: 2021-09-18
Received: 2021-05-16
Revision Accepted: 2021-08-16
Crosschecked: 0000-00-00
Cited: 0
Clicked: 2769
Xingjun ZHANG, Ningjing LIANG, Yunfei LIU, Changjiang ZHANG. SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems",
author="Xingjun ZHANG, Ningjing LIANG, Yunfei LIU, Changjiang ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100242"
}
%0 Journal Article
%T SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems
%A Xingjun ZHANG
%A Ningjing LIANG
%A Yunfei LIU
%A Changjiang ZHANG
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100242
TY - JOUR
T1 - SA-RSR: a read-optimal data recovery strategy for XOR-coded distributed storage systems
A1 - Xingjun ZHANG
A1 - Ningjing LIANG
A1 - Yunfei LIU
A1 - Changjiang ZHANG
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100242
Abstract: To ensure the reliability and availability of data, redundancy strategies are always required for distributed
storage systems. Erasure coding, one of the representative redundancy strategies, has the advantage of low storage
overhead which facilitates its employment in distributed storage systems. Among the various erasure coding schemes,
XOR-based erasure codes are becoming popular due to their fast computing speed. When single-node failure happens
in such coding schemes, a process called data recovery takes place to retrieve the failed node’s lost data from the
surviving nodes. However, the data transmission during the data recovery process usually requires a considerable
amount of time. Current research has mainly focused on reducing the amount of data needed for data recovery to
reduce the time required for data transmission, but has encountered problems such as significant complexity and local
optima. In this paper, we propose a random search recovery algorithm, named SA-RSR, to speed up single-node
failure recovery of XOR-based erasure codes. SA-RSR utilizes a simulated annealing technique to search for an
optimal recovery solution that reads and transmits a minimum amount of data. In addition, this search process
can be done in polynomial time. We evaluate SA-RSR with a variety of XOR-based erasure codes in simulations
and in a real storage system, Ceph. Experiments in Ceph show that SA-RSR reduces the amount of data required
for recovery by 20–30% and improves the performance of data recovery by 13–20% compared to the conventional
recovery method.
Open peer comments: Debate/Discuss/Question/Opinion
<1>