Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2023 Vol.24 No.7 P.1007-1027

http://doi.org/10.1631/FITEE.2200409

Explainable data transformation recommendation for automatic visualization

Author(s): Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, Jiazhi XIA
Affiliation(s): 1. State Key Lab of CAD & CG, Zhejiang University, Hangzhou 310058, China more
Corresponding email(s): wzlzju@zju.edu.cn, chenvis@zju.edu.cn
Key Words: Data transformation, Data transformation recommendation, Automatic visualization, Explainability

Share this article to： More <<< Previous Article \|Next Article >>>

Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, Jiazhi XIA. Explainable data transformation recommendation for automatic visualization[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(7): 1007-1027.

@article{title="Explainable data transformation recommendation for automatic visualization",
author="Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, Jiazhi XIA",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="7",
pages="1007-1027",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200409"
}

%0 Journal Article
%T Explainable data transformation recommendation for automatic visualization
%A Ziliang WU
%A Wei CHEN
%A Yuxin MA
%A Tong XU
%A Fan YAN
%A Lei LV
%A Zhonghao QIAN
%A Jiazhi XIA
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 7
%P 1007-1027
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200409

TY - JOUR
T1 - Explainable data transformation recommendation for automatic visualization
A1 - Ziliang WU
A1 - Wei CHEN
A1 - Yuxin MA
A1 - Tong XU
A1 - Fan YAN
A1 - Lei LV
A1 - Zhonghao QIAN
A1 - Jiazhi XIA
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 7
SP - 1007
EP - 1027
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200409

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design. Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data. However, these limited data transformations fail to capture complex patterns such as clusters and correlations. Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations, the auto-generated transformations lack explainability concerning how patterns are connected with the original features. To tackle these challenges, we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization. We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study, respectively. A recommendation algorithm is designed to compute optimal transformations, which can reveal specified types of patterns and maintain explainability. We demonstrate the effectiveness of our approach through two cases and a user study.

面向自动可视化的可解释数据变换推荐

吴子梁¹，陈为¹，马昱欣²，徐彤¹，严凡¹，吕檑¹，钱中昊¹，夏佳志³
¹浙江大学计算机辅助设计与图形学国家重点实验室，中国杭州市，310058
²南方科技大学计算机科学与工程系，中国深圳市，518055
³中南大学计算机学院，中国长沙市，410083
摘要：自动可视化技术能够为不熟悉可视化设计的用户生成有意义的可视化，以支持他们的数据分析和模式发现需求。当前，主流的自动可视化方法采用聚合与过滤从原始数据抽取模式信息。然而，这些有限的数据变换并不能捕获聚类、关联等复杂的模式。尽管特征工程领域的最新进展为更加广泛的自动数据变换提供了可能，其结果却缺少可解释性，导致变换后的模式无法与原始数据特征建立联系。为应对上述挑战，我们面向自动可视化中广泛的数据变换类型，提出一种创新的可解释推荐方法。我们通过回顾既往文献总结可行的数据变换空间，通过开展预实验总结变换可解释性的度量。我们的推荐算法能够计算最优的数据变换，这种变换能够在维持可解释性的同时揭示数据的模式信息。真实场景下的使用案例与用户实验验证了我们方法的有效性。

关键词：数据变换；数据变换推荐；自动可视化；可解释性

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abdi H, Williams LJ, 2010. Principal component analysis. WIRE Comput Stat, 2(4):433-459.

[2]Borzsony S, Kossmann D, Stocker K, 2001. The skyline operator. Proc 17^th Int Conf on Data Engineering, p.421-430.

[3]Burkart N, Huber MF, 2021. A survey on the explainability of supervised machine learning. J Artif Intell Res, 70:245-317.

[4]Cao MQ, Liang J, Li MZ, et al., 2020. TDIVis: visual analysis of tourism destination images. Front Inform Technol Electron Eng, 21(4):536-557.

[5]Chakraborty S, Nagwani NK, 2014. Analysis and study of incremental DBSCAN clustering algorithm. https://arxiv.org/abs/1406.4754

[6]Chegini M, Bernard J, Cui J, et al., 2020. Interactive visual labelling versus active learning: an experimental comparison. Front Inform Technol Electron Eng, 21(4):524-535.

[7]Chen BY, Wu H, Mo W, et al., 2018. Autostacker: a compositional evolutionary learning system. Proc Genetic and Evolutionary Computation Conf, p.402-409.

[8]Chen SM, Andrienko N, Andrienko G, et al., 2020. LDA ensembles for interactive exploration and categorization of behaviors. IEEE Trans Visual Comput Graph, 26(9):2775-2792.

[9]Chen W, Zhang TY, Zhu HY, et al., 2021. Perspectives on cross-domain visual analysis of cyber-physical-social big data. Front Inform Technol Electron Eng, 22(12):1559-1564.

[10]Collins C, Andrienko N, Schreck T, et al., 2018. Guidance in the human-machine analytics process. Vis Inform, 2(3):166-180.

[11]Cui Z, Badam SK, Yalçin MA, et al., 2019. DataSite: proactive visual data exploration with computation of insight-based recommendations. Inform Visual, 18(2):251-267.

[12]Dang TN, Wilkinson L, 2014. ScagExplorer: exploring scatterplots by their scagnostics. Proc IEEE Pacific Visualization Symp, p.73-80.

[13]Demiralp Ç, Haas PJ, Parthasarathy S, et al., 2017. Foresight: recommending visual insights. Proc VLDB Endow, 10(12):1937-1940.

[14]Dey K, Shrivastava R, Kaushik S, et al., 2017. EmTaggeR: a word embedding based novel method for hashtag recommendation on Twitter. Proc IEEE Int Conf on Data Mining Workshops, p.1025-1032.

[15]Dibia V, Demiralp Ç, 2019. Data2Vis: automatic generation of data visualizations using sequence-to-sequence recurrent neural networks. IEEE Comput Graph Appl, 39(5):33-46.

[16]Ding R, Han S, Xu Y, et al., 2019. QuickInsights: quick and automatic discovery of insights from multi-dimensional data. Proc ACM SIGMOD Int Conf on Management of Data, p.317-332.

[17]Dong XB, Yu ZW, Cao WM, et al., 2020. A survey on ensemble learning. Front Comput Sci, 14(2):241-258.

[18]Du L, Gao F, Chen X, et al., 2021. TabularNet: a neural network architecture for understanding semantic structures of tabular data. Proc 27^th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.322-331.

[19]Fu P, Lin Z, Yuan FC, et al., 2018. Learning sentiment-specific word embedding via global sentiment representation. Proc AAAI Conf on Artificial Intelligence, p.4808-4815.

[20]Geng LQ, Hamilton HJ, 2006. Interestingness measures for data mining: a survey. ACM Comput Surv, 38(3):9.

[21]Giovannangeli L, Bourqui R, Giot R, et al., 2020. Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform, 4(2):86-98.

[22]Gleicher M, 2013. Explainers: expert explorations with crafted projections. IEEE Trans Visual Comput Graph, 19(12):2042-2051.

[23]Golfarelli M, Rizzi S, 2018. From star schemas to big data: 20+ years of data warehouse research. In: Flesca S, Greco S, Masciari E, et al. (Eds.), A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Springer, Cham, p.93-107.

[24]He YY, Ganjam K, Lee K, et al., 2018a. Transform-data-by-example (TDE): extensible data transformation in Excel. Proc ACM SIGMOD Int Conf on Management of Data, p.1785-1788.

[25]He YY, Chu X, Ganjam K, et al., 2018b. Transform-data-by-example (TDE): an extensible search engine for data transformations. Proc VLDB Endow, 11(10):1165-1177.

[26]Heffetz Y, Vainshtein R, Katz G, et al., 2020. DeepLine: AutoML tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering. Proc 26^th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2103-2113.

[27]Hu K, Orghian D, Hidalgo CA, 2018. DIVE: a mixed-initiative system supporting integrated data exploration workflows. Proc Workshop on Human-in-the-Loop Data Analytics, Article 5.

[28]Hu K, Bakker MA, Li S, et al., 2019. VizML: a machine learning approach to visualization recommendation. Proc CHI Conf on Human Factors in Computing Systems, Article 128.

[29]Ilyas A, da Trindade JMF, Fernandez RC, et al., 2018. Extracting syntactical patterns from databases. Proc 34^th IEEE Int Conf on Data Engineering, p.41-52.

[30]Ingram S, Munzner T, Irvine V, et al., 2010. DimStiller: workflows for dimensional analysis and reduction. Proc IEEE Symp on Visual Analytics Science and Technology, p.3-10.

[31]Jin ZJ, Anderson MR, Cafarella M, et al., 2017. Foofah: transforming data by example. Proc ACM Int Conf on Management of Data, p.683-698.

[32]Jin ZJ, He YY, Chauduri S, 2020. Auto-transform: learning-to-transform by patterns. Proc VLDB Endow, 13(12):2368-2381.

[33]Kanter JM, Veeramachaneni K, 2015. Deep feature synthesis: towards automating data science endeavors. Proc IEEE Int Conf on Data Science and Advanced Analytics, p.1-10.

[34]Katz G, Shin ECR, Song D, 2016. ExploreKit: automatic feature generation and selection. Proc 16^th IEEE Int Conf on Data Mining, p.979-984.

[35]Kaul A, Maheshwary S, Pudi V, 2017. AutoLearn—automated feature generation and selection. Proc IEEE Int Conf on Data Mining, p.217-226.

[36]Khurana U, Turaga D, Samulowitz H, et al., 2016. Cognito: automated feature engineering for supervised learning. Proc 16^th IEEE Int Conf on Data Mining Workshops, p.1304-1307.

[37]Khurana U, Samulowitz H, Turaga D, 2018. Ensembles with automated feature engineering. ICML AutoML Workshop.

[38]Kolouri S, Pope PE, Martin CE, et al., 2018. Sliced-Wasserstein auto-encoders. Proc 17^th Int Conf on Learning Representations.

[39]Lam HT, Thiebaut JM, Sinn M, et al., 2017. One button machine for automating feature engineering in relational databases. https://arxiv.org/abs/1706.00327

[40]Law PM, Endert A, Stasko J, 2020. Characterizing automated data insights. Proc IEEE Visualization Conf, p.171-175.

[41]Li DQ, Mei HH, Shen Y, et al., 2018. ECharts: a declarative framework for rapid construction of web-based visualization. Vis Inform, 2(2):136-146.

[42]Li HT, Wang Y, Zhang SH, et al., 2022. KG4Vis: a knowledge graph-based approach for visualization recommendation. IEEE Trans Vis Comput Graph, 28(1):195-205.

[43]Lin H, Moritz D, Heer J, 2020. Dziban: balancing agency & automation in visualization design via anchored recommendations. Proc CHI Conf on Human Factors in Computing Systems, p.1-12.

[44]Liu JF, Xiong L, Pei J, et al., 2015. Finding Pareto optimal groups: group-based skyline. Proc VLDB Endow, 8(13):2086-2097.

[45]Liu SX, Andrienko G, Wu YC, et al., 2018. Steering data quality with visual analytics: the complexity challenge. Vis Inform, 2(4):191-197.

[46]Lu JH, Chen W, Ma YX, et al., 2017. Recent progress and trends in predictive visual analytics. Front Comput Sci, 11(2):192-207.

[47]Luo YY, Qin XD, Tang N, et al., 2018. DeepEye: towards automatic data visualization. Proc 34^th IEEE Int Conf on Data Engineering, p.101-112.

[48]McInnes L, Healy J, Melville J, 2018. UMAP: uniform manifold approximation and projection for dimension reduction. https://arxiv.org/abs/1802.03426v2

[49]Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. Proc 1^st Int Conf on Learning Representations.

[50]Moritz D, Wang CL, Nelson GL, et al., 2019. Formalizing visualization design knowledge as constraints: actionable and extensible models in Draco. IEEE Trans Visual Comput Graph, 25(1):438-448.

[51]Nargesian F, Samulowitz H, Khurana U, et al., 2017. Learning feature engineering for classification. Proc 26^th Int Joint Conf on Artificial Intelligence, p.2529-2535.

[52]Natani G, Watanabe S, 2021. Knowledge graph-based data transformation recommendation engine. Proc IEEE Int Conf on Big Data, p.4617-4623.

[53]Ngatchou P, Zarei A, El-Sharkawi A, 2005. Pareto multi objective optimization. Proc 13^th Int Conf on Intelligent Systems Application to Power Systems, p.84-91.

[54]Pan JC, Han DM, Guo FZ, et al., 2020. RCAnalyzer: visual analytics of rare categories in dynamic networks. Front Inform Technol Electron Eng, 21(4):491-506.

[55]Pandey A, L’Yi S, Wang QW, et al., 2022. GenoREC: a recommendation system for interactive genomics data visualization. IEEE Trans Visual Comput Graph, early access.

[56]Qian X, Rossi RA, Du F, et al., 2021. Learning to recommend visualizations from data. Proc 27^th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.1359-1369.

[57]Qin XD, Luo YY, Tang N, et al., 2018. DeepEye: an automatic big data visualization framework. Big Data Min Anal, 1(1):75-82.

[58]Qin XD, Luo YY, Tang N, et al., 2020. Making data visualization more efficient and effective: a survey. VLDB J, 29(1):93-117.

[59]Rattaphun M, Fang WC, Chiu CY, 2022. Attention on global-local representation spaces in recommender systems. IEEE Trans Comput Soc Syst, 9(5):1394-1405.

[60]Shen LX, Shen EY, Tai ZW, et al., 2021. TaskVis: task-oriented visualization recommendation. Proc Eurographics Conf on Visualization.

[61]Shi DQ, Xu XY, Sun FL, et al., 2021. Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Visual Comput Graph, 27(2):453-463.

[62]Siddiqui T, Lee J, Kim A, et al., 2017. Fast-forwarding to desired visualizations with zenvisage. Proc 8^th Biennial Conf on Innovative Data Systems Research.

[63]Singh R, 2016. BlinkFill: semi-supervised programming by example for syntactic string transformations. Proc VLDB Endow, 9(10):816-827.

[64]Tang B, Han S, Yiu ML, et al., 2017. Extracting top-k insights from multi-dimensional data. Proc ACM Int Conf on Management of Data, p.1509-1524.

[65]Tatu A, Albuquerque G, Eisemann M, et al., 2009. Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. Proc IEEE Symp on Visual Analytics Science and Technology, p.59-66.

[66]Tran B, Xue B, Zhang MJ, 2016. Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput, 8(1):3-15.

[67]Vartak M, Madden S, Parameswaran A, et al., 2014. SeeDB: automatically generating query visualizations. Proc VLDB Endow, 7(13):1581-1584.

[68]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726-1744.

[69]Wang Y, Sun ZD, Zhang HD, et al., 2019. DataShot: automatic generation of fact sheets from tabular data. IEEE Trans Visual Comput Graph, 26(1):895-905.

[70]Warren RH, Tompa FW, 2006. Multi-column substring matching for database schema translation. Proc 32^nd Int Conf on Very Large Data Bases, p.331-342.

[71]Wen Z, Zhou MX, 2008a. Evaluating the use of data transformation for information visualization. IEEE Trans Vis Comput Graph, 14(6):1309-1316.

[72]Wen Z, Zhou MX, 2008b. An optimization-based approach to dynamic data transformation for smart visualization. Proc 13^th Int Conf on Intelligent User Interfaces, p.70-79.

[73]Wilkinson L, Anand A, Grossman R, 2005. Graph-theoretic scagnostics. Proc IEEE Symp on Information Visualization, p.157-164.

[74]Wongsuphasawat K, Moritz D, Anand A, et al., 2016. Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans Visual Comput Graph, 22(1):649-658.

[75]Wongsuphasawat K, Qu ZN, Moritz D, et al., 2017. Voyager 2: augmenting visual analysis with partial view specifications. Proc CHI Conf on Human Factors in Computing Systems, p.2648-2659.

[76]Wu AY, Wang Y, Zhou MY, et al., 2022. MultiVision: designing analytical dashboards with deep learning based recommendation. IEEE Trans Visual Comput Graph, 28(1):162-172.

[77]Xia JZ, Zhang YH, Ye H, et al., 2020. SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inform Technol Electron Eng, 21(4):507-523.

[78]Yan C, He YY, 2020. Auto-suggest: learning-to-recommend data preparation steps using data science notebooks. Proc ACM SIGMOD Int Conf on Management of Data, p.1539-1554.

[79]Yao QM, Wang MS, Hugo JE, et al., 2018. Taking human out of learning applications: a survey on automated machine learning. https://arxiv.org/abs/1810.13306v1

[80]Zeng ZH, Moh P, Du F, et al., 2022. An evaluation-focused framework for visualization recommendation algorithms. IEEE Trans Visual Comput Graph, 28(1):346-356.

[81]Zhou MY, Tao W, Ji PX, et al., 2020. Table2Analysis: modeling and recommendation of common analysis patterns for multi-dimensional data. Proc 34^th AAAI Conf on Artificial Intelligence, p.320-328.

[82]Zhou MY, Li QT, He XY, et al., 2021. Table2Charts: recommending charts by learning shared table representations. Proc 27^th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2389-2399.

[83]Zhu EK, He YY, Chaudhuri S, 2017. Auto-Join: joining tables by leveraging transformations. Proc VLDB Endow, 10(10):1034-1045.

[84]Zhu SJ, Sun GD, Jiang Q, et al., 2020. A survey on automatic infographics and visualization recommendations. Vis Inform, 4(3):24-40.

[85]Zöller MA, Huber MF, 2021. Benchmark and survey of automated machine learning frameworks. J Artif Intell Res, 70:409-472.

Open peer comments: Debate/Discuss/Question/Opinion

<1>