CLC number: TP301
On-line Access: 2018-08-06
Received: 2016-09-21
Revision Accepted: 2017-01-14
Crosschecked: 2018-06-15
Cited: 0
Clicked: 6485
Divya Pandove, Shivani Goel, Rinkle Rani. An intuitive general rank-based correlation coefficient[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(6): 699-711.
@article{title="An intuitive general rank-based correlation coefficient",
author="Divya Pandove, Shivani Goel, Rinkle Rani",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="6",
pages="699-711",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601549"
}
%0 Journal Article
%T An intuitive general rank-based correlation coefficient
%A Divya Pandove
%A Shivani Goel
%A Rinkle Rani
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 6
%P 699-711
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601549
TY - JOUR
T1 - An intuitive general rank-based correlation coefficient
A1 - Divya Pandove
A1 - Shivani Goel
A1 - Rinkle Rani
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 6
SP - 699
EP - 711
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601549
Abstract: Correlation analysis is an effective mechanism for studying patterns in data and making predictions. Many interesting discoveries have been made by formulating correlations in seemingly unrelated data. We propose an algorithm to quantify the theory of correlations and to give an intuitive, more accurate correlation coefficient. We propose a predictive metric to calculate correlations between paired values, known as the general rank-based correlation coefficient. It fulfills the five basic criteria of a predictive metric: independence from sample size, value between −1 and 1, measuring the degree of monotonicity, insensitivity to outliers, and intuitive demonstration. Furthermore, the metric has been validated by performing experiments using a real-time dataset and random number simulations. Mathematical derivations of the proposed equations have also been provided. We have compared it to spearman’s rank correlation coefficient. The comparison results show that the proposed metric fares better than the existing metric on all the predictive metric criteria.
[1]Chaudhuri B, Bhattacharya A, 2001. On correlation between two fuzzy sets. Fuzzy Sets Syst, 118(3):447-456.
[2]Chen H, Chiang RHL, Storey VC, 2012. Business intelligence and analytics: from big data to big impact. MIS Q, 36(4):1165-1188.
[3]Chen N, Xu Z, Xia M, 2013. Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis. Appl Math Model, 37(4):2197-2211.
[4]Davenport T, Barth P, Bean R, 2013. How ‘Big Data’ is Different. https://sloanreview.mit.edu/article/how-big-data-is-different/
[5]Deufemia V, Giordano M, Polese G, et al., 2014. A visual language-based system for extraction-transformation-loading development. Softw Pract Exper, 44(12):1417-1440.
[6]Devarajan S, 2013. Africa’s statistical tragedy. Rev Income Wealth, 59(S1):9-15.
[7]Didelez V, Pigeot I, 2001. Judea Pearl: causality: models, reasoning, and inference. PVS, 42(2):313-315.
[8]Ginsberg J, Mohebbi MH, Patel RS, et al., 2009. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012-1014.
[9]Granville V, 2014. Developing analytic talent: becoming a data scientist. John Wiley & Sons, Inc., Indianapolis, USA.
[10]Gratton G, Kolotilin A, 2015. Euclidean fairness and efficiency. Econ Inq, 53(3):1689-1690.
[11]Hauke J, Kossowski T, 2011. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaest Geograph, 30(2):87-93.
[12]Hong DH, 2006. Fuzzy measures for a correlation coefficient of fuzzy numbers under TW (the weakest t-norm)-based fuzzy arithmetic operations. Inform Sci, 176(2):150-160.
[13]Hung WL, 2001. Using statistical viewpoint in developing correlation of intuitionistic fuzzy sets. Int J Uncert Fuzz Knowl Based Syst, 9(4):509-516.
[14]Huo X, Székely GJ, 2016. Fast computing for distance covariance. Technometrics, 58(4):435-447.
[15]Kitano H, 2002. Systems biology: a brief overview. Science, 295(5560):1662-1664.
[16]Kong J, Klein BEK, Klein R, et al., 2012. Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality. PNAS, 109(50):20352-20357.
[17]Li R, Zhong W, Zhu L, 2012. Feature screening via distance correlation learning. J Am Stat Assoc, 107(499):1129-1139.
[18]Liao H, Xu Z, Zeng X, et al., 2015a. Qualitative decision making with correlation coefficients of hesitant fuzzy linguistic term sets. Knowl Based Syst, 76:127-138.
[19]Liao H, Xu Z, Zeng X, 2015b. Novel correlation coefficients between hesitant fuzzy sets and their application in decision making. Knowl Based Syst, 82:115-127.
[20]Linden G, Smith B, York J, 2003. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Intern Comput, 7(1):76-80.
[21]Liu S, Kao C, 2002. Fuzzy measures for correlation coefficient of fuzzy numbers. Fuzzy Sets Syst, 128(2):267-275.
[22]Lyons R, 2013. Distance covariance in metric spaces. Ann Probab, 41(5):3284-3305.
[23]McGregor C, 2013. Big data in neonatal intensive care. Computer, 46(6):54-59.
[24]Mitchell HB, 2004. A correlation coefficient for intuitionistic fuzzy sets. Int J Intell Syst, 19(5):483-490.
[25]Murthy CA, Pal SK, Majumder DD, 1985. Correlation between two fuzzy membership functions. Fuzzy Sets Syst, 17(1):23-38.
[26]Reshef DN, Reshef YA, Finucane HK, et al., 2011. Detecting novel associations in large data sets. Science, 334(6062):1518-1524.
[27]Ritala P, Golnam A, Wegmann A, 2014. Coopetition-based business models: the case of Amazon.com. Ind Mark Manag, 43(2):236-249.
[28]Sen A, Dacin PA, Pattichis C, 2006. Current trends in web data analysis. Commun ACM, 49(11):85-91.
[29]Susantitaphong P, Cruz DN, Cerda J, et al., 2013. World incidence of AKI: a meta-analysis. Clin J Am Soc Nephrol, 8(9):1482-1493.
[30]Székely GJ, Rizzo ML, 2012. On the uniqueness of distance covariance. Stat Probab Lett, 82(12):2278-2282.
[31]Volpone SD, Tonidandel S, Avery DR, et al., 2015. Exploring the use of credit scores in selection processes: beware of adverse impact. J Bus Psychol, 30(2):357-372.
[32]World Bank, 2012. World Development Indicators 2012. World Development Indicators, Washington DC, USA. https://openknowledge.worldbank.org/handle/10986/linebreak6014
[33]Xiao C, Ye J, Esteves R, et al., 2015. Using Spearman’s correlation coefficients for exploratory data analysis on big dataset. Concurr Comput Pract Exp, 28(14):3866-3878.
Open peer comments: Debate/Discuss/Question/Opinion
<1>