Split Difference Weighting: An Enhanced Decision Tree Approach for Imbalanced Classification

Authors

  • Tingting Zhou School of Economics and Management, University of Science and Technology Beijing
  • Xuedong Gao School of Economics and Management, University of Science and Technology Beijing, China
  • Xi Sun Collaborative Innovation Center of Steel Technology, University of Science and Technology Beijing, China
  • Lei Han School of Economics and Management, University of Science and Technology Beijing, China

DOI:

https://doi.org/10.15837/ijccc.2024.6.6702

Keywords:

imbalanced classification, class key decision factor, split difference index, classification and regression tree algorithm weighted by split difference

Abstract

Imbalanced data classification remains a significant challenge in machine learning, particularly in decision tree algorithms where majority class features are often overshadowed. This study introduces a novel split index based on class key decision factor (CKD factor) to address this issue. We propose two new algorithms: Split Difference Decision Tree (SDDT) and Weighted Split Difference Classification and Regression Tree (WSD-CART). These algorithms enhance feature expression for majority classes during node splitting, thereby improving classification performance on imbalanced datasets. Experiments conducted on five UCI datasets with varying imbalance levels demonstrate the effectiveness of our approach. The WSD-CART algorithm consistently outperformed traditional methods, showing significant improvements in F-score, AUC, precision, recall, and accuracy, particularly for majority classes. In a real-world application to space product material classification, our method increased the true positive rate for majority class identification from 66.32% to 76.17%, while maintaining high overall accuracy. This study contributes to the field of imbalanced learning by providing a new perspective on decision tree split criteria. The proposed methods offer both improved classification performance and interpretable decision rules, making them valuable for various domains dealing with imbalanced data.

References

Mjahed, O.; Hadaj, S.E.; Guarmah, E.M.E.; Mjahed, S. (2022). Bio-Inspired hybridization of artificial neural networks for various classification tasks, Studies in Informatics and Control, 31(3), 21-30, 2022.

https://doi.org/10.24846/v31i3y202202

Du, H.; Zhang, Y.; Zhang, L.; Chen, Y. (2023). Selective ensemble learning algorithm for imbalanced dataset, Computer Science and Information Systems, 20(2), 831-856, 2023.

https://doi.org/10.2298/CSIS220817023D

Lai, W. (2023). Default prediction of internet finance users based on imbalance-xgboost, Technical Gazette, 30(3), 779-786, 2023.

https://doi.org/10.17559/TV-20230302000395

Kamaladevi M.; Venkatraman V. (2021). Tversky Similarity based Under Sampling with Gaussian Kernelized Decision Stump Adaboost Algorithm for Imbalanced Medical Data Classification, International Journal of Computers Communications & Control, 16(6), 4291, 2021.

https://doi.org/10.15837/ijccc.2021.6.4291

Zhang, K. (2023). Using deep learning to automatic inspection system of printed circuit board in manufacturing industry under the internet of things, Computer Science and Information Systems, 20(2), 723-741, 2023.

https://doi.org/10.2298/CSIS220718020Z

Pang, J.L. (2023). Adaptive fault prediction and maintenance in production lines using deep learning, International Journal of Simulation Modelling, 22(4), 734-745, 2023.

https://doi.org/10.2507/IJSIMM22-4-CO20

Li, Z.; Huang, M.; Liu, G.; Jiang, C.(2021). A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection, Expert Systems with Applications, 175, 114750, 2021.

https://doi.org/10.1016/j.eswa.2021.114750

Huang S.; Lei K. (2020). IGAN-IDS: An imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks, Ad Hoc Networks, 105, 102177, 2020.

https://doi.org/10.1016/j.adhoc.2020.102177

Goyal, P.; Verma, D.K.; Kumar, S. (2023). Diagnosis of Plant Leaf Diseases Using Image Based Detection and Prediction Using Machine Learning Approach, Economic Computation and Economic Cybernetics Studies and Research, 57(4), 293-312, 2023.

https://doi.org/10.24818/18423264/57.4.23.18

Sun T.; Zhou Z. (2018). Structural diversity for decision tree ensemble learning, Frontiers of Computer Science, 12, 560-570, 2018.

https://doi.org/10.1007/s11704-018-7151-8

Wang, J.; Zhu, B.; Liu, P.; Jia, R.; Jia, L.; Chen, W.; Feng, C.; Li, J. (2021). Screening Key Indicators for Acute Kidney Injury Prediction Using Machine Learning, International Journal of Computers Communications & Control, 16(3), 4180, 2021.

https://doi.org/10.15837/ijccc.2021.3.4180

Aaboub F.; Chamlal H.; Ouaderhman T. (2023). Statistical analysis of various splitting criteria for decision trees, Journal of Algorithms & Computational Technology, 17, 17483026231198181, 2023.

https://doi.org/10.1177/17483026231198181

Dietterich, T.; Kearns, M.; Mansour Y. (1996, July). Applying the weak learning framework to understand and improve C4.5, In Proc. 13th Int'l Conf. Machine Learning, 96-104, 1996.

Cieslak, D.; Chawla, N. (2008). Learning decision trees for unbalanced data, In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Springer Berlin Heidelberg, 241-256, 2008.

https://doi.org/10.1007/978-3-540-87479-9_34

Park Y.; Ghosh J. (2012). Ensembles of α-Trees for Imbalanced Classification Problems, IEEE Transactions on Knowledge and Data Engineering, 26(1), 131-143, 2012.

https://doi.org/10.1109/TKDE.2012.255

Boonchuay K.; Sinapiromsaran K.; Lursinsap C. (2017). Decision tree induction based on majority entropy for the class imbalance problem, Pattern Anal Applic, 20, 769-782, 2017.

https://doi.org/10.1007/s10044-016-0533-3

Liu W.; Chawla S.; Cieslak D.; Chawla, N.V. (2010, April). A robust decision tree algorithm for imbalanced data sets, In Proceedings of the 2010 SIAM International Conference on Data Mining, 766-777, 2010.

https://doi.org/10.1137/1.9781611972801.67

Hong, J.S.; Lee, J.; Sim, M.K. (2024). Concise rule induction algorithm based on one-sided maximum decision tree approach, Expert Systems with Applications, 237, 121365, 2024.

https://doi.org/10.1016/j.eswa.2023.121365

Lv X.; Liu C.; Zhu J. (2011). Improved Algorithm of Decision Tree Based on Key Decision Factor and Its Applications in Railway Transportation, Journal of the China Railway Society, 33(09), 62-67, 2011.

Chandra B.; Kothari R.; Paul P. (2010). A new node splitting measure for decision tree construction, Pattern Recognition, 43(8), 2725-2731, 2010.

https://doi.org/10.1016/j.patcog.2010.02.025

Zhang S.C. (2012). Decision tree classifiers sensitive to heterogeneous costs, Journal of Systems and Software, 85(4), 771-779, 2012.

https://doi.org/10.1016/j.jss.2011.10.007

Rodríguez, J.J.; Díez-Pastor, J.F.; García-Osorio, C. (2011). Ensembles of decision trees for imbalanced data, In International workshop on multiple classifier systems, Berlin, Heidelberg: Springer Berlin Heidelberg, 76-85, 2011.

https://doi.org/10.1007/978-3-642-21557-5_10

Yang, H. (2023). A random forest approach to appraise personal credit risk of internet loans, Technical Gazette, 30(2), 492-498, 2023.

https://doi.org/10.17559/TV-20221003064737

Japkowicz, N. (2013). Assessment metrics for imbalanced learning, Imbalanced learning: Foundations, algorithms, and applications, 187-206, 2013.

https://doi.org/10.1002/9781118646106.ch8

Blakey-Milner, B.; Gradl, P.; Snedden, G.; Brooks, M.; Pitot, J.; Lopez E.; Leary M.; Berto F.; Du Plessis A. (2021). Metal additive manufacturing in aerospace: A review, Materials & Design, 209, 110008, 2021.

https://doi.org/10.1016/j.matdes.2021.110008

Djari, A. (2023) Influence of the membership functions number of fuzzy logic controller on the performances of dynamic systems

https://doi.org/10.33436/v33i1y202308

Romanian Journal of Information Technology & Automatic Control/Revista Română de Informatică s, i Automatică, 33(1), 93-106. 2023.

https://doi.org/10.33436/v33i1y202308

Li, Z.P. (2022). Management decisions in multi-variety small-batch product manufacturing process, International Journal of Simulation Modelling, 21(4), 537-547, 2022.

https://doi.org/10.2507/IJSIMM21-3-CO15

Clempner, J.B. (2023). An Ergodic and Transient Markov Model for Penalty Regularised Portfolio, Economic Computation and Economic Cybernetics Studies and Research, 57(4), 275-292, 2023.

https://doi.org/10.24818/18423264/57.4.23.17

Zhang, Y.M.; Song, Y.F.; Meng, X.; Liu, Z.G. (2023). Optimizing supply chain efficiency with fuzzy critic-edas, International Journal of Simulation Modelling, 22(4), 723-733, 2023.

https://doi.org/10.2507/IJSIMM22-4-CO19

Negoiţă, R.F.; Borangiu, T. (2023). Robotic Process Automation of Inventory Demand with Intelligent Reservation, Studies in Informatics and Control, 32(2), 5-14. 2023.

https://doi.org/10.24846/v32i2y202301

Additional Files

Published

2024-11-01

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.