Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

A visual analysis approach for data imputation via multi-party tabular data correlation strategies

Abstract: Data imputation is an essential pre-processing task for data governance, aimed at filling in incomplete data. However, conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data, and they fail to achieve the best balance between accuracy and efficiency. In this paper, we present a novel visual analysis approach for data imputation. We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables. Then, we perform the initial imputation of incomplete data using correlated data entries from other tables. Additionally, we develop a visual analysis system to refine data imputation candidates. Our interactive system combines the multi-party data imputation approach with expert knowledge, allowing for a better understanding of the relational structure of the data. This significantly enhances the accuracy and efficiency of data imputation, thereby enhancing the quality of data governance and the intrinsic value of data assets. Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using their domain knowledge.

Key words: Data governance; Data incompleteness; Data imputation; Data visualization; Interactive visual analysis

Chinese Summary  <1> 基于多方表格数据关联策略的数据补全可视分析方法

朱海洋1,2,韩东明1,潘嘉铖1,魏雅婷3,封颖超杰1,翁罗轩1,毛科添1,邢远凯2,闾建树2,万邱成2,陈为1
1浙江大学计算机辅助设计与图形系统全国重点实验室,中国杭州市,310058
2物产中大数字科技有限公司,中国杭州市,310020
3物产中大金属集团有限公司,中国杭州市,310005
摘要:数据补全是数据治理的一项重要预处理任务,目的是填补不完整的数据。然而,传统的数据补全方法只能通过单张数据表格在一定程度上缓解数据的不完整问题,并未能在补全值的准确性和效率之间达到最佳平衡。本文提出了一种新颖的数据补全可视化分析方法;设计了一套多方表格数据关联策略,采用智能算法识别相似列并在多个表格之间建立列之间的关联关系,然后利用其它表格中的相似数据条目对缺失数据进行初始补全;开发了一个可视分析系统来优化数据补全的候选值。本文中的交互式系统将多方数据补全方法与专家知识相结合,有助于更好地理解数据的关系结构,显著提高了数据补全的准确性和效率,提升了数据治理质量和数据资产内在价值。实验验证和用户调查表明,本文方法支持用户使用领域知识验证判断相关列及相似行。

关键词组:数据治理;数据不完整;数据补全;数据可视化;交互式可视分析


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2300480

CLC number:

TP391.4

Download Full Text:

Click Here

Downloaded:

254

Clicked:

432

Cited:

0

On-line Access:

2024-03-25

Received:

2023-07-17

Revision Accepted:

2024-03-25

Crosschecked:

2023-10-29

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE