Full Text:   <2296>

Summary:  <1594>

CLC number: TP311

On-line Access: 2016-01-05

Received: 2015-01-12

Revision Accepted: 2015-06-11

Crosschecked: 2015-12-25

Cited: 1

Clicked: 6649

Citations:  Bibtex RefMan EndNote GB/T7714


Dipayan Dev


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2016 Vol.17 No.1 P.15-31


Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal

Author(s):  Dipayan Dev, Ripon Patgiri

Affiliation(s):  Department of Computer Science and Engineering, NIT Silchar, India

Corresponding email(s):   dev.dipayan16@gmail.com, ripon@cse.nits.ac.in

Key Words:  Hadoop, NameNode, Metadata, Locality-preserving hashing, Consistent hashing

Dipayan Dev, Ripon Patgiri. Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(1): 15-31.

@article{title="Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal",
author="Dipayan Dev, Ripon Patgiri",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
%A Dipayan Dev
%A Ripon Patgiri
%J Frontiers of Information Technology & Electronic Engineering
%V 17
%N 1
%P 15-31
%@ 2095-9184
%D 2016
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500015

T1 - Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal
A1 - Dipayan Dev
A1 - Ripon Patgiri
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 17
IS - 1
SP - 15
EP - 31
%@ 2095-9184
Y1 - 2016
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500015

In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among nameNodes, which are the metadata servers of hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the nameNodes to keep load balancing. nameNode is a centralized heart of the hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. hadoop’ after the name of the authors.

This paper aims to address the bottlenecks (limited scalability, single point of failure) of the centralized metadata management in the HDFS of Hadoop framework by proposing distributed peer-to-peer metadata cluster management. The metadata is distributed across multiple metadata servers (namenodes) through the consistent hashing technique to achieve load balancing, and the locality-preserving hashing to achieve data-locality. The proposed work is called Dr. Hadoop, which is compared with the vanilla Hadoop using two data sets from Yahoo and Microsoft, and Dr. Hadoop shows considerable performance improvement. This work is definitely meaningful, as the centralized metadata management of HDFS has been an inherent scalability and reliability problem. This work tries to address the problems, and will be beneficial to the data processing community in the Internet domains in the era of big data.

Dr. Hadoop: Hadoop的一种无限可扩展元数据管理机制—小象如何不老?

目的:在这个“兆兆兆字节”(Exa byte)时代,数据量随时间指数率增长。剧增的数据在文件系统中制造了大量的元数据(metadata)。虽然Hadoop是处理大数据时最广泛采用的软件架构,其效率仍被研究者们广泛质疑。有必要为Hadoop创建一个有效且可扩展的元数据管理机制。
创新点:基于哈希的映射和子树分区适用于分布式元数据管理方案。基于哈希的映射在NameNode(Hadoop中存储元数据的服务器)间均衡地分配负载,但受到元数据空间局部性的限制;子树分区不需为保持负载均衡而迁移元数据,但也不能在服务器间均衡任务负载。本文提出一种称为DCMS(dynamic circular metadata splitting,动态环形元数据分割)的环形元数据管理机制(图3),并依此构建了Hadoop的改进框架—Dr. Hadoop(“Dr.”来自于本文作者名字首字母Dipayan DEV,Ripon PATGIRI)。NameNode是Hadoop的核心,其对所有文件路径树的保存失败将导致单点故障(single point of failure,SPoF)。DCMS能够移除Hadoop中的单点故障,从而提供一种有效且可扩展的元数据管理机制。
方法:通过使用局部保持哈希(locality-preserving hashing,LpH)保持元数据的空间局部性,通过使用一致性哈希(consistent hashing)保持服务器间的负载均衡,通过保留复制后的元数据实现高可靠性。
结论:理论分析表明,Dr. Hadoop架构在99.99%的时间能够可靠使用。通过衡量数据吞吐率、容错性和NameNode负载等性能,DCMS在大规模文件系统上较传统方法更具效力。


Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Aguilera, M.K., Chen, W., Toueg, S., 1997. Heartbeat: a timeoutfree failure detector for quiescent reliable communication. Proc. 11th Int. Workshop on Distributed Algorithms, p.126-140.

[2]Apache Software Foundation, 2012. Hot Standby for NameNode. Available from http://issues.apache.org/jira/browse/HDFS-976.

[3]Beaver, D., Kumar, S., Li, H.C., et al., 2010. Finding a needle in haystack: Facebookąŕs photo storage. OSDI, p.47-60.

[4]Biplob, D., Sengupta, S., Li, J., 2010. FlashStore: high throughput persistent key-value store. Proc. VLDB Endowment, p.1414-1425.

[5]Bisciglia, C., 2009. Hadoop HA Configuration. Available from http://www.cloudera.com/blog/2009/07/22/hadoop-haconfiguration/.

[6]Braam, R. Z. PJ, 2007. Lustre: a Scalable, High Performance File System. Cluster File Systems, Inc.

[7]Brandt, S.A., Miller, E.L, Long, D.D.E., et al., 2003. Efficient metadata management in large distributed storage systems. IEEE Symp. on Mass Storage Systems, p.290-298.

[8]Cao, Y., Chen, C., Guo, F., et al., 2011. Es2: a cloud data storage system for supporting both OLTP and OLAP. Proc. IEEE ICDE, p.291-302.

[9]Corbett, P.F., Feitelson, D.G., 1996. The Vesta parallel file system. ACM Trans. Comput. Syst., 14(3):225-264.

[10]DeCandia, G., Hastorun, D., Jampani, M., et al., 2007. Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev., 41(6):205-220.

[11]Dev, D., Patgiri, R., 2014. Performance evaluation of HDFS in big data management. Int. Conf. on High Performance Computing and Applications, p.1-7.

[12]Dev, D., Patgiri, R., 2015. HAR+: archive and metadata distribution! Why not both? ICCCI, in press.

[13]Escriva, R., Wong, B., Sirer, E.G., 2012. HyperDex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev., 42(4):25-36.

[14]Fred, H., McNab, R., 1998. SimJava: a discrete event simulation library for Java. Simul. Ser., 30:51-56.

[15]Ghemawat, S., Gobioff, H., Leung, S.T., 2003. The Google file system. Proc. 19th ACM Symp. on Operating Systems Principles, p.29-43.

[16]Haddad, I.F., 2000. Pvfs: a parallel virtual file system for Linux clusters. Linux J., p.5.

[17]Wiki, 2012. NameNode Failover, on Wiki Apache Hadoop. Available from http://wiki.apache.org/hadoop/NameNodeFailover.

[18]HDFS, 2010. Hadoop AvatarNode High Availability. Available from http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html.

[19]Karger, D., Lehman, E., Leighton, F., et al., 1997. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. Proc. 29th Annual ACM Symp. on Theory of Computing, p.654-663.

[20]Kavalanekar, S., Worthington, B.L., Zhang, Q., et al., 2008. Characterization of storage workload traces from production Windows Servers. Proc. IEEE IISWC, p.119-128.

[21]Lewin, D., 1998. Consistent hashing and random trees: algorithms for caching in distributed networks. Master Thesis, Department of EECS, MIT.

[22]Lim, H., Fan, B., Andersen, D.G., et al., 2011. SILT: a memory-efficient, high-performance key-value store. Proc. 23rd ACM Symp. on Operating Systems Principles.

[23]McKusick, M.K., Quinlan, S., 2009. GFS: evolution on fast-forward. ACM Queue, 7(7):10-20.

[24]Miller, E.L., Katz, R.H., 1997. Rama: an easy-to-use, high-performance parallel file system. Parall. Comput., 23(4-5):419-446.

[25]Miller, E.L., Greenan, K., Leung, A., et al., 2008. Reliable and efficient metadata storage and indexing using nvram. Available from dcslab.hanyang.ac.kr/nvramos08/EthanMiller.pdf.

[26]Nagle, D., Serenyi, D., Matthews, A., 2004. The Panasas activescale storage cluster-delivering scalable high bandwidth storage. Proc. ACM/IEEE SC, p.1-10.

[27]Okorafor, E., Patrick, M.K., 2012. Availability of Jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters. Adv. Comput., 3(3):19-30.

[28]Ousterhout, J.K., Costa, H.D., Harrison, D., et al., 1985. A trace-driven analysis of the Unix 4.2 BSD file system. SOSP, p.15-24.

[29]Raicu, I., Foster, I.T., Beckman, P., 2011. Making a case for distributed file systems at exascale. Proc. 3rd Int. Workshop on Large-Scale System and Application Performance, p.11-18.

[30]Rodeh, O., Teperman, A., 2003. ZFS—a scalable distributed file system using object disks. IEEE Symp. on Mass Storage Systems, p.207-218.

[31]Satyanarayanan, M., Kistler, J.J., Kumar, P., et al., 1990. Coda: a highly available file system for a distributed workstation environment. IEEE Trans. Comput., 39(4):447-459.

[32]Shvachko, K., Kuang, H.R., Radia, S., et al., 2010. The Hadoop Distributed File System. IEEE 26th Symp. on Mass Storage Systems and Technologies, p.1-10.

[33]Torodanhan, 2009. Best Practice: DB2 High Availability Disaster Recovery. Available from http://www.ibm.com/developerworks/wikis/display/data/Best+Practice+-+DB2+High+Availability+Disaster+Recovery.

[34]U.S. Department of Commerce/NIST, 1995. FIPS 180-1. Secure Hash Standard. National Technical Information Service, Springfield, VA.

[35]Wang, F., Qiu, J., Yang, J., et al., 2009. Hadoop high availability through metadata replication. Proc. 1st Int. Workshop on Cloud Data Management, p.37-44.

[36]Weil, S.A., Pollack, K.T., Brandt, S.A., et al., 2004. Dynamic metadata management for petabyte-scale file systems. SC, p.4.

[37]Weil, S.A., Brandt, S.A., Miller, E.L., et al., 2006. CEPH: a scalable, high-performance distributed file system. OSDI, p.307-320.

[38]White, T., 2009. Hadoop: the Definitive Guide. O’Reilly Media, Inc.

[39]White, B.S., Walker, M., Humphrey, M., et al., 2001. Legionfs: a secure and scalable file system supporting cross-domain highperformance applications. Proc. ACM/IEEE Conf. on Supercomputing, p.59.

[40]Yadava, H., 2007. The Berkeley DB Book. Apress.

[41]Zhu, Y., Jiang, H., Wang, J., et al., 2008. Hba: Distributed metadata management for large cluster-based storage systems. IEEE Trans. Parall. Distrib. Syst., 19(6):750-763.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE