Tversky Similarity based UnderSampling with Gaussian Kernelized Decision Stump Adaboost Algorithm for Imbalanced Medical Data Classification
DOI:
https://doi.org/10.15837/ijccc.2021.6.4291Keywords:
Data Imbalance, Undersampling, Tversky, Similarity Indexive Regression, Gaussian Kernelized, Decision Stump AdaBoostingAbstract
In recent years, imbalanced data classification are utilized in several domains including, detecting fraudulent activities in banking sector, disease prediction in healthcare sector and so on. To solve the Imbalanced classification problem at data level, strategy such as undersampling or oversampling are widely used. Sampling technique pose a challenge of significant information loss. The proposed method involves two processes namely, undersampling and classification. First, undersampling is performed by means of Tversky Similarity Indexive Regression model. Here, regression along with the Tversky similarity index is used in analyzing the relationship between two instances from the dataset. Next, Gaussian Kernelized Decision stump AdaBoosting is used for classifying the instances into two classes. Here, the root node in the Decision Stump takes a decision on the basis of the Gaussian Kernel function, considering average of neighboring points accordingly the results is obtained at the leaf node. Weights are also adjusted to minimizing the training errors occurring during classification to find the best classifier. Experimental assessment is performed with two different imbalanced dataset (Pima Indian diabetes and Hepatitis dataset). Various performance metrics such as precision, recall, AUC under ROC score and F1-score are compared with the existing undersampling methods. Experimental results showed that prediction accuracy of minority class has improved and therefore minimizing false positive and false negative.
References
[2] Bin Liu, Grigorios Tsoumakas ,(2020). "Dealing with class imbalance in classifier chains via random undersampling", Pattern Recognition, Elsevier, Volume 102, Pages 1-34 [random undersampling] https://doi.org/10.1016/j.knosys.2019.105292
[3] Pattaramon Vuttipittayamongko, Eyad Elyan, (2019)."Neighbourhood-based undersampling approach for handling imbalanced and overlapped data", Information Sciences, Elsevier,[tomeklinks] https://doi.org/10.1016/j.ins.2019.08.062
[4] MichałKoziarski, ,(2020)."Radial-Based Undersampling for imbalanced data classification", Pattern Recognition, Elsevier. https://doi.org/10.1016/j.patcog.2020.107262
[5] Nijaguna Gollara Siddappa, Thippeswamy Kampalappa, "Adaptive Condensed Nearest Neighbor for Imbalance Data Classification ", International Journal of Intelligent Engineering & Systems
[6] Ikram Chaabane, Radhouane Guermazi, Mohamed Hammami, (2019). "Enhancing techniques for learning decision trees from imbalanced data", Advances in Data Analysis and Classification, Springer. https://doi.org/10.1007/s11634-019-00354-x
[7] Colin Bellinger, Shiven Sharma, Nathalie Japkowicz, Osmar R. Zaí¯ane,(2019). "Framework for extreme imbalance classification: SWIM-sampling with themajority class", Knowledge and Information Systems, Springer, https://doi.org/10.1007/s10115-019-01380-z
[8] Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel (2018)."Efficient k-nearest neighbors search in graph space", Pattern Recognition Letters, Elsevier
[9] Ahmad S. Tarawneh, Ahmad B. A. Hassanat, Khalid Almohammadi, Dmitry Chetverikov, Colin Bellinge, (2020). "SMOTEFUNA: Synthetic Minority Over-Sampling Technique Based on Furthest Neighbour Algorithm", IEEE Access. https://doi.org/10.1109/ACCESS.2020.2983003
Additional Files
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.