Comparison of classification techniques based on medical datasets

Document Type : Research Paper

Authors

1 Engineering Technical College of Al-Najaf, Al-Furat Al-Awsat Technical University(ATU), Al-Najaf, Iraq

2 Faculty of Education for Girls, University of Kufa, Al- Najaf, Iraq

3 College of Information Technology, University of Babylon, Babil, Iraq

Abstract

Medical data mining has been a widespread data mining area of late. Mainly, diagnosing cancers is one of the most important topics that many researchers studied to develop intelligent decision support systems to help doctors. In this research, three different classifiers are used to improve the performance in terms of accuracy. The classifiers are Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), and Random forests (RF). Two machine learning repository datasets are used to evaluate and verify the classification methods. Classifiers are trained using the 10-fold cross-validation strategy, which splits the original sample into training and testing sets. In order to assess classifier efficiency, accuracy (AC), precision, recall, specificity, F1, and area under the curve are used (AUC).  The Experiments showed that the AdaBoost classifier’s achieved an accuracy of 100\% which is superior in both datasets in comparison with SVM and RF with AC of 97\%. The accuracy is also compared with another study from the previous work that uses the same datasets, and the results demonstrated that the current research has better accuracy than the other study.

Keywords