CLC number: TP301
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2021-02-14
Cited: 0
Clicked: 6456
Citations: Bibtex RefMan EndNote GB/T7714
Minggang DONG, Ming LIU, Chao JING. One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(2): 278-290.
@article{title="One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning",
author="Minggang DONG, Ming LIU, Chao JING",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="2",
pages="278-290",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000417"
}
%0 Journal Article
%T One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning
%A Minggang DONG
%A Ming LIU
%A Chao JING
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 2
%P 278-290
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000417
TY - JOUR
T1 - One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning
A1 - Minggang DONG
A1 - Ming LIU
A1 - Chao JING
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 2
SP - 278
EP - 290
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000417
Abstract: Since traditional machine learning methods are sensitive to skewed distribution and do not consider the characteristics in multiclass imbalance problems, the skewed distribution of multiclass data poses a major challenge to machine learning algorithms. To tackle such issues, we propose a new splitting criterion of the decision tree based on the one-against-all-based hellinger distance (OAHD). Two crucial elements are included in OAHD. First, the one-against-all scheme is integrated into the process of computing the hellinger distance in OAHD, thereby extending the hellinger distance decision tree to cope with the multiclass imbalance problem. Second, for the multiclass imbalance problem, the distribution and the number of distinct classes are taken into account, and a modified Gini index is designed. Moreover, we give theoretical proofs for the properties of OAHD, including skew insensitivity and the ability to seek a purer node in the decision tree. Finally, we collect 20 public real-world imbalanced data sets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and the University of California, Irvine (UCI) repository. Experimental and statistical results show that OAHD significantly improves the performance compared with the five other well-known decision trees in terms of Precision, F-measure, and multiclass area under the receiver operating characteristic curve (MAUC). Moreover, through statistical analysis, the Friedman and Nemenyi tests are used to prove the advantage of OAHD over the five other decision trees.
[1]Abdi L, Hashemi S, 2016. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng, 28(1):238-251. doi: 10.1109/TKDE.2015.2458858
[2]Akash PS, Kadir ME, Ali AA, et al., 2019. Inter-node Hellinger distance based decision tree. Proc 28th Int Joint Conf on Artificial Intelligence, p.1967-1973. doi: 10.24963/ijcai.2019/272
[3]Alcala-Fdez J, Fernandez A, Luengo J, et al., 2011. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multi-valued Logic Soft Comput, 17(2-3):255-287.
[4]Ali H, Salleh MNM, Saedudin R, et al., 2019. Imbalance class problems in data mining: a review. Indones J Elect Eng Comput Sci, 14(3):1560-1571. doi: 10.11591/ijeecs.v14.i3.pp1552-1563
[5]Anand R, Mehrotra K, Mohan CK, et al., 1995. Efficient classification for multiclass problems using modular neural networks. IEEE Trans Neur Netw, 6(1):117-124. doi: 10.1109/72.363444
[6]Asuncion A, 2007. UCI Machine Learning Repository. University of California, Irvine, USA. https://archive.ics.uci.edu/ml/index.php
[7]Boonchuay K, Sinapiromsaran K, Lursinsap C, 2017. Decision tree induction based on minority entropy for the class imbalance problem. Patt Anal Appl, 20(3):769-782. doi: 10.1007/s10044-016-0533-3
[8]Bradley AP, 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Patt Recogn, 30(7):1145-1159. doi: 10.1016/S0031-3203(96)00142-2
[9]Breiman L, Friedman JH, Olshen RA, et al., 1984. Classification and regression trees. Biometrics, 40(3):874. doi: 10.2307/2530946
[10]Chandra B, Kothari R, Paul P, 2010. A new node splitting measure for decision tree construction. Patt Recogn, 43(8):2725-2731. doi: 10.1016/j.patcog.2010.02.025
[11]Cichocki A, Amari SI, 2010. Families of Alpha- Beta- and Gamma-divergences: flexible and robust measures of similarities. Entropy, 12(6):1532-1568. doi: 10.3390/e12061532
[12]Cieslak DA, Chawla NV, 2008. Learning decision trees for unbalanced data. In: Daelemans W, Goethals B, Morik K (Eds.), Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Germany, p.241-256. doi: 10.1007/978-3-540-87479-9_34
[13]Cieslak DA, Hoens TR, Chawla NV, et al., 2012. Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Discov, 24(1):136-158. doi: 10.1007/s10618-011-0222-1
[14]Feng L, Wang HB, Jin B, et al., 2019. Learning a distance metric by balancing KL-divergence for imbalanced datasets. IEEE Trans Syst Man Cybern Syst, 49(12):2384-2395. doi: 10.1109/TSMC.2018.2790914
[15]Flach PA, 2003. The geometry of ROC space: understanding machine learning metrics through ROC isometrics. Proc 20th Int Conf on Machine Learning, p.194-201.
[16]Friedman M, 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc, 32(200):675-701. doi: 10.1080/01621459.1937.10503522
[17]Friedman M, 1940. A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat, 11(1):86-92. doi: 10.1214/aoms/1177731944
[18]Hanley JA, McNeil BJ, 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29-36. doi: 10.1148/radiology.143.1.7063747
[19]He HB, Garcia EA, 2009. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 21(9):1263-1284. doi: 10.1109/TKDE.2008.239
[20]Iman RL, Davenport JM, 1980. Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods, 9(6):571-595. doi: 10.1080/03610928008827904
[21]Kailath T, 1967. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol, 15(1):52-60. doi: 10.1109/TCOM.1967.1089532
[22]Kotsiantis SB, 2013. Decision trees: a recent overview. Artif Intell Rev, 39(4):261-283. doi: 10.1007/s10462-011-9272-4
[23]Liu W, Chawla S, Cieslak DA, et al., 2010. A robust decision tree algorithm for imbalanced data sets. Proc SIAM Int Conf on Data Mining, p.766-777. doi: 10.1137/1.9781611972801.67
[24]Nekooeimehr I, Lai-Yuen SK, 2016. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Syst Appl, 46:405-416. doi: 10.1016/j.eswa.2015.10.031
[25]Nemenyi P, 1963. Distribution-Free Multiple Comparisons. MS Thesis, Princeton University, Princeton, USA.
[26]Osei-Bryson KM, 2014. Overview on decision tree induction. In: Osei-Bryson KM, Ngwenyama O (Eds.), Advances in Research Methods for Information Systems Research. Springer, Boston, USA, p.15-22. doi: 10.1007/978-1-4614-9463-8_3
[27]Quinlan JR, 1986. Induction of decision trees. Mach Learn, 1(1):81-106. doi: 10.1007/BF00116251
[28]Safavian SR, Landgrebe D, 1991. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern, 21(3):660-674. doi: 10.1109/21.97458
[29]Sharmin S, Shoyaib M, Ali AA, et al., 2019. Simultaneous feature selection and discretization based on mutual information. Patt Recogn, 91:162-174. doi: 10.1016/j.patcog.2019.02.016
[30]Su C, Cao J, 2019. Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria. Appl Intell, 49(3):1127-1145. doi: 10.1007/s10489-018-1314-z
[31]Vilalta R, Oblinger D, 2000. A quantification of distance bias between evaluation metrics in classification. Proc 17th Int Conf on Machine Learning, p.1087-1094.
[32]Wan ZQ, Jiang C, Fahad M, et al., 2020. Robot-assisted pedestrian regulation based on deep reinforcement learning. IEEE Trans Cybern, 50(4):1669-1682. doi: 10.1109/TCYB.2018.2878977
[33]Wu XD, Kumar V, Quinlan JR, et al., 2008. Top 10 algorithms in data mining. Knowl Inform Syst, 14(1):1-37. doi: 10.1007/s10115-007-0114-2
Open peer comments: Debate/Discuss/Question/Opinion
<1>