A local density-based outlier detection method for high dimension data

Document Type : Research Paper

Authors

1 University of Baghdad, College of Management and Economics, Department Of Statistics, Iraq

2 Ministry of planning,Central Statistical Organization, Iraq

10.22075/ijnaa.2022.5784

Abstract

The researchers faced challenges in the outlier detection process, mainly when deals with the high dimensional dataset; to handle this problem, we use The principal component analysis. Outlier detection or anomaly detection, with local density-based methods, compares the density of observation with the surrounding local density neighbors. We apply the outlier score as a measure of comparison. In this research, we choose different density estimation functions and calculated different distances. Weighted kernel density estimation with adaptive bandwidth for multivariate kernel density estimation(Gaussian) considered the $KNN$ and RNN. $KNN$ is considered too for the Epanenchnikov kernel density estimation. Lastly, we estimate the LOF as a base method in detecting outliers. Extensive experiments on a synthetic dataset have shown that RKDOS and EPA are more efficient than LOF using the precision evaluation criterion.

Keywords