Outlier detection in test samples and supervised training set selection

Document Type : Research Paper

Authors

1 Department of Computer Engineering‎, ‎Babol Branch‎, ‎Islamic Azad University‎, ‎Babol‎, ‎Iran

2 Department of Computer Engineering‎, ‎Sari Branch‎, ‎Islamic Azad University‎, ‎Sari‎, ‎Iran

10.22075/ijnaa.2021.4878

Abstract

‎Outlier detection is a technique for recognizing samples out of the main population within a data set‎. ‎Outliers have negative impacts on classification‎. ‎The recognized outliers are deleted to improve the classification power generally‎. ‎This paper proposes a method for outlier detection in test samples besides a supervised training set selection‎. ‎Training set selection is done based on the intersection of three well known similarity measures namely‎, ‎jacquard‎, ‎cosine‎, ‎and dice‎. ‎Each test sample is evaluated against the selected training set for possible outlier detection‎. ‎The selected training set is used for a two-stage classification‎. ‎The accuracy of classifiers are increased after outlier deletion‎. ‎The majority voting function is used for further improvement of classifiers‎.

Keywords