Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

A disk failure prediction model for multiple issues

Abstract: Disk failure prediction methods have been useful in handing a single issue, e.g., heterogeneous disks, model aging, and minority samples. However, because these issues often exist simultaneously, prediction models that can handle only one will result in prediction bias in reality. Existing disk failure prediction methods simply fuse various models, lacking discussion of training data preparation and learning patterns when facing multiple issues, although the solutions to different issues often conflict with each other. As a result, we first explore the training data preparation for multiple issues via a data partitioning pattern, i.e., our proposed multi-property data partitioning (MDP). Then, we consider learning with the partitioned data for multiple issues as learning multiple tasks, and introduce the model-agnostic meta-learning (MAML) framework to achieve the learning. Based on these improvements, we propose a novel disk failure prediction model named MDP-MAML. MDP addresses the challenges of uneven partitioning and difficulty in partitioning by time, and MAML addresses the challenge of learning with multiple domains and minor samples for multiple issues. In addition, MDP-MAML can assimilate emerging issues for learning and prediction. On the datasets reported by two real-world data centers, compared to state-of-the-art methods, MDP-MAML can improve the area under the curve (AUC) and false detection rate (FDR) from 0.85 to 0.89 and from 0.85 to 0.91, respectively, while reducing the false alarm rate (FAR) from 4.88% to 2.85%.

Key words: Storage system reliability; Disk failure prediction; Self-monitoring analysis and reporting technology (SMART); Machine learning

Chinese Summary  <9> 一个针对多种问题的磁盘故障预测模型

关云川1,刘渝2,周可1,李强3,王团结3,李辉3
1华中科技大学武汉光电国家研究中心,中国武汉市,430074
2华中科技大学计算机科学与技术学院,中国武汉市,430074
3浪潮电子信息产业股份有限公司,中国北京市,250000
摘要:磁盘故障预测方法在单一问题上的解决方案十分成熟,例如磁盘异构问题、模型老化问题和小样本问题。然而,由于这些问题经常同时存在,只能处理其中一个问题的模型在实际预测中存在偏差。目前针对不同问题的解决方案经常相互冲突,然而现有磁盘故障预测方法通常简单地融合各种模型,缺乏在面对多个问题时对训练数据准备和学习模式的讨论。为此,提出一种多属性数据划分方法(MDP),来探索针对多个问题的训练数据准备。引入与模型无关的元学习算法(MAML),对被划分的多个数据子集进行多任务学习。基于这些改进,提出一种名为MDP-MAML的磁盘故障预测模型。MDP解决了数据不均匀划分和按时间划分的挑战,而MAML解决了针对多个问题小样本学习的问题。此外,MDP-MAML能够适应新出现的问题并进行学习和预测。在两个实际数据中心的数据集上,与最先进方法相比,MDP-MAML将曲线下面积(AUC)从0.85提升至0.89,将误检率(FDR)从0.85提升至0.91,将误报率(FAR)从4.88%降低至2.85%。

关键词组:存储系统可靠性;磁盘故障预测;自我监测、分析及报告技术(SMART);机器学习


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2200488

CLC number:

TN333

Download Full Text:

Click Here

Downloaded:

1872

Download summary:

<Click Here> 

Downloaded:

277

Clicked:

761

Cited:

0

On-line Access:

2023-07-24

Received:

2022-10-19

Revision Accepted:

2023-07-24

Crosschecked:

2023-06-13

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE