Comparison of classification techniques based on medical datasets

Document Type : Research Paper

Authors

1 Engineering Technical College of Al-Najaf, Al-Furat Al-Awsat Technical University(ATU), Al-Najaf, Iraq

2 Faculty of Education for Girls, University of Kufa, Al- Najaf, Iraq

3 College of Information Technology, University of Babylon, Babil, Iraq

Abstract

Medical data mining has been a widespread data mining area of late. Mainly, diagnosing cancers is one of the most important topics that many researchers studied to develop intelligent decision support systems to help doctors. In this research, three different classifiers are used to improve the performance in terms of accuracy. The classifiers are Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), and Random forests (RF). Two machine learning repository datasets are used to evaluate and verify the classification methods. Classifiers are trained using the 10-fold cross-validation strategy, which splits the original sample into training and testing sets. In order to assess classifier efficiency, accuracy (AC), precision, recall, specificity, F1, and area under the curve are used (AUC).  The Experiments showed that the AdaBoost classifier’s achieved an accuracy of 100\% which is superior in both datasets in comparison with SVM and RF with AC of 97\%. The accuracy is also compared with another study from the previous work that uses the same datasets, and the results demonstrated that the current research has better accuracy than the other study.

Keywords

[1] E. Abdullah, A. Lafta and S. Alasadi, Information gain-based enhanced classification techniques, Next Generation
of Internet of Things (2021) 499–511.
[2] E. Abdullah, S. Alasadi and A. Al-Joda, Text mining based sentiment analysis using a novel deep learning
approach, Int. J. Nonlinear Anal. Appl. 1(12) (2021) 595–604.
[3] N. Al-Aaraji, E. Al- Shamery and A. Abdulhussein, ARNN for enhancing driftdetection of data stream based on
modified page hinckley model, J. Engin. Appl. Sci. 13(10) (2018) 8281–8291.
[4] T.A. Al-Asadi, A.J. Obaid and A.A. Alkhayat, Proposed method for web pages clustering using latent semantic
analysis, J. Engin. Appl. Sci. 12(8) (2017) 8270–8277.
[5] H. Alghodhaifi, A. Alghodhaifi and M. Alghodhaifi, Predicting invasive ductal carcinoma in breast histology
images using convolutional neural network, IEEE IEEE National Aerospace . Electr. Conf. 2019, pp. 374-378.
[6] V. Chaurasia and P. Saurabh, A novel approach for breast cancer detection using data mining techniques, Int. J.
Innov. Res. Computer Commun. Engin. 2 (2017).
[7] M. Goyani and N. Patel, Multi-level haar wavelet based facial expression recognition using logistic regression, Int.
J. Next-Generation Comput. 1 (2018) 51–131.
[8] J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Morgan Kauffman. 2011.
[9] J. Han, J. Pei and M. Kamber, Data Mining: Concepts and Techniques, Elsevier. 2011.
[10] A. Janssens and F. Martens, Reflection on modern methods: revisiting the area under the ROC curve, Int. J.
Epidemio. 49(4) (2020) 403–1397.
[11] D. Lavanya and D. Rani, Analysis of feature selection with classification: Breast cancer datasets, Indian J.
Comput. Sci. Engin. 2(5) (2011) 756–63.
[12] S. Mohammed, S. Darrab, S. Noaman and G. Saake, Analysis of breast cancer detection using different machine
learning techniques, InInternational Conference on Data Mining and Big Data, 2020, pp. 108–117.
[13] J. Obaid, T. Chatterjee and A. Bhattacharya, Semantic Web and Web Page Clustering Algorithms: A Landscape
View, EAI Endorsed Transactions on Energy Web. 8(33) (2020).
[14] U. Ojha and S. Goel, A study on prediction of breast cancer recurrence using data mining techniques, IEEE 7th
International Conference on Cloud Computing, Data Science & Engineering-Confluence, 2017, pp. 527–530.
[15] A. Pritom, M. Munshi, S. Sabab and S. Shihab, Predicting breast cancer recurrence using effective classification
and feature selection technique, IEEE 19th Int. Conf. Comput. Inf. Technol. (2016) pp.310–314.
[16] A. Pritom, M. Munshi, S. Sabab and S. Shihab, Predicting breast cancer recurrence using effective classification
and feature selection technique, IEEE 19th Int. Conf. Comput. Inf. Technol. 2016, pp. 310–314.
[17] A. Saabith, E. Sundararajan and A. Bakar, Comparative study on different classification techniques for breast
cancer dataset, Int. J. Comput. Sc. Mob. Comput. 3(10) (2014) 91–185.[18] G. Salama, M.B. Abdelhalim and M.A. Zeid, Experimental comparison of classifiers for breast cancer diagnosis,
IEEE. In2012 Seventh Int. Conf. Comput. Engin. Syst. 2012 pp. 180–185.
[19] G. Salama, M. Abdelhalim and M. Zeid, Breast cancer diagnosis on three different datasets using multi-classifiers,
Breast Cancer (WDBC) 32(569) (2012).
[20] M. Santos, J. Soares, P. Abreu, H. Araujo and J. Santos, Cross-validation for imbalanced datasets: avoiding
overoptimistic and overfitting approaches, IEEE Comput. Intell. Mag. 13(4) (2018) 59–76.
[21] J. Silva, O. Lezama, N. Varela and L. Borrero, Integration of data mining classification techniques and ensemble
learning for predicting the type of breast cancer recurrence. Int. Conf. Green, Pervasive, and Cloud Computing.
2019, pp. 18–30.
[22] T. Simon, I. Gambo, R. Ikono and H. Soriyan, A multi-nodal implementation of apriori algorithm for big data
analytics using MapReduce framework, Int. J. Appl. Inf. Syst. 12(31) (2020).
[23] R. Srinivas, Managing Large Data Sets Using Support Vector Machines, University of Nebraska at Lincoln, 2010.
[24] A. Taherkhani, G. Cosma and T. McGinnity, AdaBoost-CNN: An adaptive boosting algorithm for convolutional
neural networks to classify multi-class imbalanced datasets using transfer learning, Neurocomput. 3(404) (2020)
66–351.
[25] W. Wolberg, W. Street and O. Mangasarian, Machine learning techniques to diagnose breast cancer from imageprocessed nuclear features of fine needle aspirates, Cancer lett. 77(2-3) (1994) 71–163.
[26] Z. Xiong, Y. Cui, Z. Liu, Y. Zhao, M. Hu and J. Hu, Evaluating explorative prediction power of machine learning
algorithms for materials discovery using k-fold forward cross-validation, Comput. Materials Sci. 1(171) (2020).
[27] Dataset Description, Available at: UCI Machine Learning Repository.
[28] Breast Cancer Wisconsin Dataset, Available at: UCI Machine Learning Repository
Volume 12, Special Issue
December 2021
Pages 1957-1964
  • Receive Date: 04 October 2021
  • Revise Date: 16 November 2021
  • Accept Date: 05 December 2021