Split Difference Weighting: An Enhanced Decision Tree Approach for Imbalanced Classification
DOI:
https://doi.org/10.15837/ijccc.2024.6.6702Keywords:
imbalanced classification, class key decision factor, split difference index, classification and regression tree algorithm weighted by split differenceAbstract
Imbalanced data classification remains a significant challenge in machine learning, particularly in decision tree algorithms where majority class features are often overshadowed. This study introduces a novel split index based on class key decision factor (CKD factor) to address this issue. We propose two new algorithms: Split Difference Decision Tree (SDDT) and Weighted Split Difference Classification and Regression Tree (WSD-CART). These algorithms enhance feature expression for majority classes during node splitting, thereby improving classification performance on imbalanced datasets. Experiments conducted on five UCI datasets with varying imbalance levels demonstrate the effectiveness of our approach. The WSD-CART algorithm consistently outperformed traditional methods, showing significant improvements in F-score, AUC, precision, recall, and accuracy, particularly for majority classes. In a real-world application to space product material classification, our method increased the true positive rate for majority class identification from 66.32% to 76.17%, while maintaining high overall accuracy. This study contributes to the field of imbalanced learning by providing a new perspective on decision tree split criteria. The proposed methods offer both improved classification performance and interpretable decision rules, making them valuable for various domains dealing with imbalanced data.
References
Mjahed, O.; Hadaj, S.E.; Guarmah, E.M.E.; Mjahed, S. (2022). Bio-Inspired hybridization of artificial neural networks for various classification tasks, Studies in Informatics and Control, 31(3), 21-30, 2022.
https://doi.org/10.24846/v31i3y202202
Du, H.; Zhang, Y.; Zhang, L.; Chen, Y. (2023). Selective ensemble learning algorithm for imbalanced dataset, Computer Science and Information Systems, 20(2), 831-856, 2023.
https://doi.org/10.2298/CSIS220817023D
Lai, W. (2023). Default prediction of internet finance users based on imbalance-xgboost, Technical Gazette, 30(3), 779-786, 2023.
https://doi.org/10.17559/TV-20230302000395
Kamaladevi M.; Venkatraman V. (2021). Tversky Similarity based Under Sampling with Gaussian Kernelized Decision Stump Adaboost Algorithm for Imbalanced Medical Data Classification, International Journal of Computers Communications & Control, 16(6), 4291, 2021.
https://doi.org/10.15837/ijccc.2021.6.4291
Zhang, K. (2023). Using deep learning to automatic inspection system of printed circuit board in manufacturing industry under the internet of things, Computer Science and Information Systems, 20(2), 723-741, 2023.
https://doi.org/10.2298/CSIS220718020Z
Pang, J.L. (2023). Adaptive fault prediction and maintenance in production lines using deep learning, International Journal of Simulation Modelling, 22(4), 734-745, 2023.
https://doi.org/10.2507/IJSIMM22-4-CO20
Li, Z.; Huang, M.; Liu, G.; Jiang, C.(2021). A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection, Expert Systems with Applications, 175, 114750, 2021.
https://doi.org/10.1016/j.eswa.2021.114750
Huang S.; Lei K. (2020). IGAN-IDS: An imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks, Ad Hoc Networks, 105, 102177, 2020.
https://doi.org/10.1016/j.adhoc.2020.102177
Goyal, P.; Verma, D.K.; Kumar, S. (2023). Diagnosis of Plant Leaf Diseases Using Image Based Detection and Prediction Using Machine Learning Approach, Economic Computation and Economic Cybernetics Studies and Research, 57(4), 293-312, 2023.
https://doi.org/10.24818/18423264/57.4.23.18
Sun T.; Zhou Z. (2018). Structural diversity for decision tree ensemble learning, Frontiers of Computer Science, 12, 560-570, 2018.
https://doi.org/10.1007/s11704-018-7151-8
Wang, J.; Zhu, B.; Liu, P.; Jia, R.; Jia, L.; Chen, W.; Feng, C.; Li, J. (2021). Screening Key Indicators for Acute Kidney Injury Prediction Using Machine Learning, International Journal of Computers Communications & Control, 16(3), 4180, 2021.
https://doi.org/10.15837/ijccc.2021.3.4180
Aaboub F.; Chamlal H.; Ouaderhman T. (2023). Statistical analysis of various splitting criteria for decision trees, Journal of Algorithms & Computational Technology, 17, 17483026231198181, 2023.
https://doi.org/10.1177/17483026231198181
Dietterich, T.; Kearns, M.; Mansour Y. (1996, July). Applying the weak learning framework to understand and improve C4.5, In Proc. 13th Int'l Conf. Machine Learning, 96-104, 1996.
Cieslak, D.; Chawla, N. (2008). Learning decision trees for unbalanced data, In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Springer Berlin Heidelberg, 241-256, 2008.
https://doi.org/10.1007/978-3-540-87479-9_34
Park Y.; Ghosh J. (2012). Ensembles of α-Trees for Imbalanced Classification Problems, IEEE Transactions on Knowledge and Data Engineering, 26(1), 131-143, 2012.
https://doi.org/10.1109/TKDE.2012.255
Boonchuay K.; Sinapiromsaran K.; Lursinsap C. (2017). Decision tree induction based on majority entropy for the class imbalance problem, Pattern Anal Applic, 20, 769-782, 2017.
https://doi.org/10.1007/s10044-016-0533-3
Liu W.; Chawla S.; Cieslak D.; Chawla, N.V. (2010, April). A robust decision tree algorithm for imbalanced data sets, In Proceedings of the 2010 SIAM International Conference on Data Mining, 766-777, 2010.
https://doi.org/10.1137/1.9781611972801.67
Hong, J.S.; Lee, J.; Sim, M.K. (2024). Concise rule induction algorithm based on one-sided maximum decision tree approach, Expert Systems with Applications, 237, 121365, 2024.
https://doi.org/10.1016/j.eswa.2023.121365
Lv X.; Liu C.; Zhu J. (2011). Improved Algorithm of Decision Tree Based on Key Decision Factor and Its Applications in Railway Transportation, Journal of the China Railway Society, 33(09), 62-67, 2011.
Chandra B.; Kothari R.; Paul P. (2010). A new node splitting measure for decision tree construction, Pattern Recognition, 43(8), 2725-2731, 2010.
https://doi.org/10.1016/j.patcog.2010.02.025
Zhang S.C. (2012). Decision tree classifiers sensitive to heterogeneous costs, Journal of Systems and Software, 85(4), 771-779, 2012.
https://doi.org/10.1016/j.jss.2011.10.007
Rodríguez, J.J.; Díez-Pastor, J.F.; García-Osorio, C. (2011). Ensembles of decision trees for imbalanced data, In International workshop on multiple classifier systems, Berlin, Heidelberg: Springer Berlin Heidelberg, 76-85, 2011.
https://doi.org/10.1007/978-3-642-21557-5_10
Yang, H. (2023). A random forest approach to appraise personal credit risk of internet loans, Technical Gazette, 30(2), 492-498, 2023.
https://doi.org/10.17559/TV-20221003064737
Japkowicz, N. (2013). Assessment metrics for imbalanced learning, Imbalanced learning: Foundations, algorithms, and applications, 187-206, 2013.
https://doi.org/10.1002/9781118646106.ch8
Blakey-Milner, B.; Gradl, P.; Snedden, G.; Brooks, M.; Pitot, J.; Lopez E.; Leary M.; Berto F.; Du Plessis A. (2021). Metal additive manufacturing in aerospace: A review, Materials & Design, 209, 110008, 2021.
https://doi.org/10.1016/j.matdes.2021.110008
Djari, A. (2023) Influence of the membership functions number of fuzzy logic controller on the performances of dynamic systems
https://doi.org/10.33436/v33i1y202308
Romanian Journal of Information Technology & Automatic Control/Revista Română de Informatică s, i Automatică, 33(1), 93-106. 2023.
https://doi.org/10.33436/v33i1y202308
Li, Z.P. (2022). Management decisions in multi-variety small-batch product manufacturing process, International Journal of Simulation Modelling, 21(4), 537-547, 2022.
https://doi.org/10.2507/IJSIMM21-3-CO15
Clempner, J.B. (2023). An Ergodic and Transient Markov Model for Penalty Regularised Portfolio, Economic Computation and Economic Cybernetics Studies and Research, 57(4), 275-292, 2023.
https://doi.org/10.24818/18423264/57.4.23.17
Zhang, Y.M.; Song, Y.F.; Meng, X.; Liu, Z.G. (2023). Optimizing supply chain efficiency with fuzzy critic-edas, International Journal of Simulation Modelling, 22(4), 723-733, 2023.
https://doi.org/10.2507/IJSIMM22-4-CO19
Negoiţă, R.F.; Borangiu, T. (2023). Robotic Process Automation of Inventory Demand with Intelligent Reservation, Studies in Informatics and Control, 32(2), 5-14. 2023.
Additional Files
Published
Issue
Section
License
Copyright (c) 2024 tingting zhou, Xuedong Gao, Xi Sun, Yingxue Pan, Lei Han
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.