JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

2010 Vol.11 No.5 P.315-327

Scalable high performance de-duplication backup via hash join

Tian-ming Yang, Dan Feng, Zhong-ying Niu, Ya-ping Wan

Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

dfeng@hust.edu.cn

Abstract: Apart from high space efficiency, other demanding requirements for enterprise de-duplication backup are high performance, high scalability, and availability for large-scale distributed environments. The main challenge is reducing the significant disk input/output (I/O) overhead as a result of constantly accessing the disk to identify duplicate chunks. Existing inline de-duplication approaches mainly rely on duplicate locality to avoid disk bottleneck, thus suffering from degradation under poor duplicate locality workload. This paper presents Chunkfarm, a post-processing de-duplication backup system designed to improve capacity, throughput, and scalability for de-duplication. Chunkfarm performs de-duplication backup using the hash join algorithm, which turns the notoriously random and small disk I/Os of fingerprint lookups and updates into large sequential disk I/Os, hence achieving high write throughput not influenced by workload locality. More importantly, by decentralizing fingerprint lookup and update, Chunkfarm supports a cluster of servers to perform de-duplication backup in parallel; it hence is conducive to distributed implementation and thus applicable to large-scale and distributed storage systems.

Key words: Backup system, De-duplication, Post-processing, Fingerprint lookup, Scalability

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/jzus.C0910445

CLC number:

TP309.3

Download Full Text:

Click Here

Downloaded:

3873

Clicked:

10064

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2009-12-31

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS