Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

DDUC: an erasure-coded system with decoupled data updating and coding

Abstract: In distributed storage systems, replication and erasure code (EC) are common methods for data redundancy. Compared with replication, EC has better storage efficiency, but suffers higher overhead in update. Moreover, consistency and reliability problems caused by concurrent updates bring new challenges to applications of EC. Many works focus on optimizing the EC solution, including algorithm optimization, novel data update method, and so on, but lack the solutions for consistency and reliability problems. In this paper, we introduce a storage system that decouples data updating and EC encoding, namely, decoupled data updating and coding (DDUC), and propose a data placement policy that combines replication and parity blocks. For the (N,M) EC system, the data are placed as N groups of M+1 replicas, and redundant data blocks of the same stripe are placed in the parity nodes, so that the parity nodes can autonomously perform local EC encoding. Based on the above policy, a two-phase data update method is implemented in which data are updated in replica mode in phase 1, and the EC encoding is done independently by parity nodes in phase 2. This solves the problem of data reliability degradation caused by concurrent updates while ensuring high concurrency performance. It also uses persistent memory (PMem) hardware features of the byte addressing and eight-byte atomic write to implement a lightweight logging mechanism that improves performance while ensuring data consistency. Experimental results show that the concurrent access performance of the proposed storage system is 1.70–3.73 times that of the state-of-the-art storage system Ceph, and the latency is only 3.4%–5.9% that of Ceph.

Key words: Concurrent update; High reliability; Erasure code; Consistency; Distributed storage system

Chinese Summary  <9> DDUC:数据更新与编码解耦的纠删码系统

屠要峰1,2,肖蓉2,韩银俊1,2,陈正华2,金浩2,齐学成2,孙辛远2
1移动网络和移动多媒体技术国家重点实验室,中国深圳市,518000
2中兴通讯股份有限公司,中国南京市,210000
摘要:在分布式存储系统中,常用的数据冗余方法包括副本和纠删码(erasure code,EC)。相较于副本,EC具有更好的存储效率,但是在更新方面的开销更大。此外,并发更新带来的一致性和可靠性问题给EC应用带来了新的挑战。许多研究工作都致力于优化EC技术,包括算法优化、数据更新方法创新等,但并发更新的一致性和可靠性问题尚未得到很好解决。本文介绍了一种将数据更新与EC编码解耦的存储系统,命名为DDUC,并提出了一种副本与校验块结合的放置策略。对于(N, M)的EC系统,按照NM+1的副本进行数据布局,并将同一条带的冗余数据块都放置在校验节点上,使得校验节点可以自主地执行本地EC编码。基于上述策略,实现了一种两阶段数据更新方法,在第一阶段按照副本模式进行数据更新,在第二阶段由校验节点独立完成EC编码。这样在保证高并发性能的同时,解决了并发更新导致的数据可靠性降低的问题。同时利用PMem硬件的字节寻址和8字节原子写特性实现了一种轻量级的日志机制,在提升性能的同时保证了数据的一致性。实验结果表明,和当前主流的存储系统Ceph相比,本文所提出的存储系统并发访问性能提升至1.70-3.73倍,时延仅为Ceph的3.4%-5.9%。

关键词组:并发更新;高可靠性;纠删码;一致性;分布式存储系统


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2200466

CLC number:

TP333

Download Full Text:

Click Here

Downloaded:

5438

Download summary:

<Click Here> 

Downloaded:

277

Clicked:

1234

Cited:

0

On-line Access:

2023-05-31

Received:

2022-10-15

Revision Accepted:

2023-05-31

Crosschecked:

2023-02-12

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE