Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Big data storage technologies: a survey

Abstract: There is a great thrust in industry toward the development of more feasible and viable tools for storing fast-growing volume, velocity, and diversity of data, termed ‘big data’. The structural shift of the storage mechanism from traditional data management systems to NoSQL technology is due to the intention of fulfilling big data storage requirements. However, the available big data storage technologies are inefficient to provide consistent, scalable, and available solutions for continuously growing heterogeneous data. Storage is the preliminary process of big data analytics for real-world applications such as scientific experiments, healthcare, social networks, and e-business. So far, Amazon, Google, and Apache are some of the industry standards in providing big data storage solutions, yet the literature does not report an in-depth survey of storage technologies available for big data, investigating the performance and magnitude gains of these technologies. The primary objective of this paper is to conduct a comprehensive investigation of state-of-the-art storage technologies available for big data. A well-defined taxonomy of big data storage technologies is presented to assist data analysts and researchers in understanding and selecting a storage mechanism that better fits their needs. To evaluate the performance of different storage architectures, we compare and analyze the existing approaches using Brewer’s CAP theorem. The significance and applications of storage technologies and support to other categories are discussed. Several future research challenges are highlighted with the intention to expedite the deployment of a reliable and scalable storage system.

Key words: Big data; Big data storage; NoSQL databases; Distributed databases; CAP theorem; Scalability; Consistency- partition resilience; Availability-partition resilience

Chinese Summary  <3073> 大数据存储技术综述

概要:对于容量快速增长、日趋多元化的大数据,业界亟需开发可行性更好的存储工具。为满足大数据存储需求,存储机制已经形成从传统数据管理系统到NoSQL技术的结构化转移。然而,目前可用的大数据存储技术无法为持续增长的异构数据提供一致、可扩展和可用的解决方案。在科学实验、医疗保健、社交网络和电子商务等实际应用中,存储是大数据分析的第一步。截至目前,亚马逊、谷歌和阿帕奇等公司形成了大数据存储方案的行业标准,但尚未有关于大数据存储技术性能和容量提升的深入调查和文献报告。本文旨在对目前可用于大数据的最先进的存储技术进行全面调查,提供了一个明确的大数据存储技术分类方法,以帮助数据分析师和研究人员了解和选择更适合其需求的存储机制。我们使用布鲁尔的CAP定理比较和分析了现有存储方法,评估了不同存储架构的性能,讨论了存储技术的意义、应用及其对其他类别数据的支持。为了加快部署可靠和可扩展的存储系统,文中还突出了未来研究面临的几个挑战。

关键词组:大数据;大数据存储;NoSQL数据库;分布式数据库;CAP定理;可扩展性;一致性-分区弹性;可用性-分区弹性


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1500441

CLC number:

TP311.13

Download Full Text:

Click Here

Downloaded:

15990

Download summary:

<Click Here> 

Downloaded:

9249

Clicked:

26251

Cited:

0

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2017-08-08

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE