A local density-based outlier detection method for high dimension data

Document Type : Research Paper

Authors

1 University of Baghdad, College of Management and Economics, Department Of Statistics, Iraq

2 Ministry of planning,Central Statistical Organization, Iraq

Abstract

The researchers faced challenges in the outlier detection process, mainly when deals with the high dimensional dataset; to handle this problem, we use The principal component analysis. Outlier detection or anomaly detection, with local density-based methods, compares the density of observation with the surrounding local density neighbors. We apply the outlier score as a measure of comparison. In this research, we choose different density estimation functions and calculated different distances. Weighted kernel density estimation with adaptive bandwidth for multivariate kernel density estimation(Gaussian) considered the KNN and RNN. KNN is considered too for the Epanenchnikov kernel density estimation. Lastly, we estimate the LOF as a base method in detecting outliers. Extensive experiments on a synthetic dataset have shown that RKDOS and EPA are more efficient than LOF using the precision evaluation criterion.

Keywords

[1] M. M. Breunig, H. P. Kriegel, R. T. Ng and J. Sander, LOF: identifying density-based local outliers, ACM SIGMOD Record, 29 (2)(2000) 93-104.
[2] S. Dahal, Effect of different distance measures in result of cluster analysis, MS thesis, 2015.
[3] V. A. Epanechnikov, Non-parametric estimation of a multivariate probability density, Theory Probab. Appl., 14 (1)(1969) 153-158.
[4] H. Fan, O. R. Za¨ıane, A. Foss and J. Wu, A nonparametric outlier detection for effectively discovering top-n outliers from engineering data, In: Pacific-Asia conf. Knowl. Discovery Data Min. Springer, Berlin, Heidelberg, 2006, pp. 557-566.
[5] O. Fink, E. Zio and U. Weidmann, Novelty detection by multivariate kernel density estimation and growing neural gas algorithm, Mech. Syst. Signal Proc., 50(2015) 427-436.
[6] J. Gao, W. Hu, W. Li, Z. Zhang and O. Wu, Local outlier detection based on kernel regression, In: 2010 20th Int. Conf. Pattern Recognit., 2010 pp. 585-588, IEEE.
[7] J. Gao, W. Hu, Z. M. Zhang, X. Zhang and O. Wu, RKOF: robust kernel-based local outlier detection, In: PacificAsia Conf. Knowl. Discovery Data Min. Springer, Berlin, Heidelberg, 2011, pp.270-283.
[8] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11 (1) (1969) 1-21.
[9] D. M. Hawkins, Identification of Outliers, Chapman and Hall., London, Vol 11, 1980.
[10] W. Jin, A. K. H. Tung, J. Han and W. Wang, Ranking outliers using symmetric neighborhood relationship, In: Pacific-Asia Conf. Knowl. Discovery Data Min., Berlin,2006, pp. 577-593.
[11] L. J. Latecki, A. Lazarevic and D. Pokrajac, Outlier detection with kernel density functions, In: Proc. Int. Conf. Mach. Learn. Data Min. Pattern Recognit. , 2007 pp. 61-75.
[12] B. F. J. Manly and J. A. N. Alberto, Multivariate statistical methods: a primer, Chapman and Hall/CRC, 2016.
[13] S. Papadimitriou, H. Kitagawa, P. B. Gibbons and C. Faloutsos, Loci: Fast outlier detection using the local correlation integral, In: Proc. 19th int. conf. data eng., Cat. No. 03CH37405, 2003, pp. 315-326.
[14] S. Ramaswamy, R. Rastogi and K. Shim, Efficient algorithms for mining outliers from large data sets, ACM Sigmod Record, 29 (2)(2000) 427-438.
[15] S. Shekhar, C. T. Lu and P. Zhang, Detecting graph-based spatial outliers, Intell. Data Anal., 6(5)(2002) 451-468.
[16] B. Tang and H. He, A local density-based approach for outlier detection, Neurocomputing, 241(2017) 171-180.
[17] A. Wahid and A. C. S. Rao, Rkdos: A relative kernel density-based outlier score, IETE Technical Rev., 37 (5)(2020) 441-452.
[18] X. Xu, H. Liu, L. Li and M. Yao, A comparison of outlier detection techniques for high-dimensional data, Int. J. Comput. Intell. Syst., 11 (1)(2018) 652-662.
[19] L. Zhang, J. Lin and R. Karim, Adaptive kernel density-based anomaly detection for nonlinear systems, Knowledge-Based Syst., 139(2018) 50-63.
Volume 13, Issue 1
March 2022
Pages 1683-1699
  • Receive Date: 25 May 2021
  • Accept Date: 19 October 2021