|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2016 Vol.17 No.1 P.15-31
Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
Abstract: In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.
Key words: Hadoop, NameNode, Metadata, Locality-preserving hashing, Consistent hashing
创新点:基于哈希的映射和子树分区适用于分布式元数据管理方案。基于哈希的映射在NameNode(Hadoop中存储元数据的服务器)间均衡地分配负载,但受到元数据空间局部性的限制;子树分区不需为保持负载均衡而迁移元数据,但也不能在服务器间均衡任务负载。本文提出一种称为DCMS(dynamic circular metadata splitting,动态环形元数据分割)的环形元数据管理机制(图3),并依此构建了Hadoop的改进框架—Dr. Hadoop(“Dr.”来自于本文作者名字首字母Dipayan DEV,Ripon PATGIRI)。NameNode是Hadoop的核心,其对所有文件路径树的保存失败将导致单点故障(single point of failure,SPoF)。DCMS能够移除Hadoop中的单点故障,从而提供一种有效且可扩展的元数据管理机制。
方法:通过使用局部保持哈希(locality-preserving hashing,LpH)保持元数据的空间局部性,通过使用一致性哈希(consistent hashing)保持服务器间的负载均衡,通过保留复制后的元数据实现高可靠性。
结论:理论分析表明,Dr. Hadoop架构在99.99%的时间能够可靠使用。通过衡量数据吞吐率、容错性和NameNode负载等性能,DCMS在大规模文件系统上较传统方法更具效力。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1500015
CLC number:
TP311
Download Full Text:
Downloaded:
9043
Download summary:
<Click Here>Downloaded:
2059Clicked:
9441
Cited:
1
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2015-12-25