A local density-based outlier detection method for high dimension data

Document Type : Research Paper

Authors

1 University of Baghdad, College of Management and Economics, Department Of Statistics, Iraq

2 Ministry of planning,Central Statistical Organization, Iraq

Abstract

The researchers faced challenges in the outlier detection process, mainly when deals with the high dimensional dataset; to handle this problem, we use The principal component analysis. Outlier detection or anomaly detection, with local density-based methods, compares the density of observation with the surrounding local density neighbors. We apply the outlier score as a measure of comparison. In this research, we choose different density estimation functions and calculated different distances. Weighted kernel density estimation with adaptive bandwidth for multivariate kernel density estimation(Gaussian) considered the $KNN$ and RNN. $KNN$ is considered too for the Epanenchnikov kernel density estimation. Lastly, we estimate the LOF as a base method in detecting outliers. Extensive experiments on a synthetic dataset have shown that RKDOS and EPA are more efficient than LOF using the precision evaluation criterion.

Keywords

[1] M. M. Breunig, H. P. Kriegel, R. T. Ng and J. Sander, LOF: identifying density-based local outliers, ACM SIGMOD
Record, 29 (2)(2000) 93-104.
[2] S. Dahal, Effect of different distance measures in result of cluster analysis, MS thesis, 2015.
[3] V. A. Epanechnikov, Non-parametric estimation of a multivariate probability density, Theory Probab. Appl., 14
(1)(1969) 153-158.
[4] H. Fan , O. R. Za¨─▒ane, A. Foss and J. Wu, A nonparametric outlier detection for effectively discovering top-n
outliers from engineering data, In: Pacific-Asia conf. Knowl. Discovery Data Min. Springer, Berlin, Heidelberg,
2006, pp. 557-566 .
[5] O. Fink, E. Zio and U. Weidmann, Novelty detection by multivariate kernel density estimation and growing neural
gas algorithm, Mech. Syst. Signal Proc., 50(2015) 427-436.
[6] J. Gao, W. Hu, W. Li, Z. Zhang and O. Wu, Local outlier detection based on kernel regression, In: 2010 20th Int.
Conf. Pattern Recognit., 2010 pp. 585-588, IEEE.
[7] J. Gao, W. Hu, Z. M. Zhang, X. Zhang and O. Wu, RKOF: robust kernel-based local outlier detection, In: PacificAsia Conf. Knowl. Discovery Data Min. Springer, Berlin, Heidelberg, 2011, pp.270-283.
[8] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11 (1) (1969) 1-21.
[9] D. M. Hawkins , Identification of Outliers, Chapman and Hall., London, Vol 11, 1980.
[10] W. Jin, A. K. H. Tung, J. Han and W. Wang, Ranking outliers using symmetric neighborhood relationship, In:
Pacific-Asia Conf. Knowl. Discovery Data Min., Berlin,2006, pp. 577-593.
[11] L. J. Latecki, A. Lazarevic and D. Pokrajac, Outlier detection with kernel density functions, In: Proc. Int. Conf.
Mach. Learn. Data Min. Pattern Recognit. , 2007 pp. 61-75 .
[12] B. F. J. Manly and J. A. N. Alberto, Multivariate statistical methods: a primer, Chapman and Hall/CRC, 2016.
[13] S. Papadimitriou, H. Kitagawa, P. B. Gibbons and C. Faloutsos, Loci: Fast outlier detection using the local
correlation integral, In: Proc. 19th int. conf. data eng., Cat. No. 03CH37405, 2003, pp. 315-326 .
[14] S. Ramaswamy, R. Rastogi and K. Shim, Efficient algorithms for mining outliers from large data sets, ACM
Sigmod Record, 29 (2)(2000) 427-438.
[15] S. Shekhar, C. T. Lu and P. Zhang, Detecting graph-based spatial outliers, Intell. Data Anal., 6(5)(2002) 451-468.
[16] B. Tang and H. He, A local density-based approach for outlier detection, Neurocomputing, 241(2017) 171-180.
[17] A. Wahid and A. C. S. Rao, Rkdos: A relative kernel density-based outlier score, IETE Technical Rev., 37 (
5)(2020) 441-452.
[18] X. Xu, H. Liu, L. Li and M. Yao, A comparison of outlier detection techniques for high-dimensional data, Int. J.
Comput. Intell. Syst., 11 (1)(2018) 652-662.
[19] L. Zhang, J. Lin and R. Karim, Adaptive kernel density-based anomaly detection for nonlinear systems,
Knowledge-Based Syst., 139(2018) 50-63.
Volume 13, Issue 1
March 2022
Pages 1683-1699
  • Receive Date: 25 May 2021
  • Accept Date: 19 October 2021
  • First Publish Date: 10 November 2021