A density-based clustering method with calculating the Eps parameter

Document Type : Research Paper

Authors

1 Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

2 Department of Applied Mathematics, Sari Branch, Islamic Azad University, Sari, Iran

10.22075/ijnaa.2024.33345.4962

Abstract

With regard to the non-linear nature of real-life data, their clusters' shapes are non-convex and unfortunately, some clustering methods cannot identify non-convex clusters and this is a challenge. Density-based clustering methods could be a solution to this problem. Among all methods of this type, the DBSCAN algorithm can cluster data with different shapes, sizes, and densities and also identify noise points. However, owing to the use of static input parameters-the neighbourhood radius ($Eps$) and the minimum value for cluster formation ($MinPts$)- this algorithm has some problems such as the difficulty in accurately determining these parameters in high-dimensional data sets and not recognizing clusters with different densities. Accordingly, this paper presents a density clustering algorithm, which requires minimal input parameters and one of its main parameters is $Eps$, which is automatically calculated based on the $k$-nearest neighbours of points and its value is different for each cluster. To evaluate the effectiveness of the proposed algorithm, some experiments were conducted. The obtained results showed the effectiveness and efficiency of the presented algorithm regarding the correct identification of clusters with the desired shape, size, and density. In addition, the proposed algorithm was found effective in estimating the number of clusters in most of the data sets considered in this study.

Keywords

[1] K. Backhaus, B. Erichson, S. Gensler, R. Weiber, and T. Weiber, Cluster analysis, Multivariate Anal.: Application-Oriented Introduction, Springer, 2023, pp. 453–532.
[2] R. Bhuyan and S. Borah, A survey of some density based clustering techniques, arXiv preprint arXiv:2306.09256 (2023).
[3] A. Bryant and K. Cios, Rnn-dbscan: A density-based clustering algorithm using reverse nearest neighbor density estimates, IEEE Trans. Knowledge Data Engin. 30 (2017), no. 6, 1109–1121.
[4] A.A. Bushra, D. Kim, Y. Kan, and G. Yi, Autoscan: Automatic detection of dbscan parameters and efficient clustering of data in overlapping density regions, Peer J. Comput. Sci. 10 (2024), e1921.
[5] I. de Moura Ventorim, D. Luchi, A.L. Rodrigues, and F.M. Varejao, Birchscan: A sampling method for applying dbscan to large datasets, Expert Syst. Appl. 184 (2021), 115518.
[6] S. Erich, S. Jorg, E. Martin, K.H. Peter, and X. Xiaowei, Dbscan revisited, revisited: Why and how you should (still) use dbscan, ACM Trans. Database Syst. 42 (2017), no. 3, 1–21.
[7] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, KDD 96 (1996), no. 34, 226–231.
[8] A. Fahim, An extended dbscan clustering algorithm, Int. J. Adv. Comput. Sci. Appl. 13 (2022), no. 3.
[9] Z. Falahiazar, A.R. Bagheri, and M. Reshadi, Determining parameters of dbscan algorithm in dynamic environments automatically using dynamic multi-objective genetic algorithm, J. AI Data Min. 10 (2022), no. 3, 321–332.
[10] X. Huang, T. Ma, C. Liu, and S. Liu, Grit-dbscan: A spatial clustering algorithm for very large databases, Pattern Recog. 142 (2023), 109658.
[11] L. Hubert and P. Arabie, Comparing partitions, J. Class. 2 (1985), no. 1, 193–218.
[12] J.-Hun Kim, J.-H. Choi, Y.-H. Park, C. Kai-Sang Leung, and A. Nasridinov, Knn-sc: Novel spectral clustering algorithm using k-nearest neighbors, IEEE Access 9 (2021), 152616–152627.
[13] O. Kulkarni and A. Burhanpurwala, A survey of advancements in dbscan clustering algorithms for big data, 3rd Int. Conf. Power Electron. IoT Appl. Renew. Energy Control (PARC), IEEE, 2024, pp. 106–111.
[14] B. Ma, C. Yang, A. Li, Y. Chi, and L. Chen, A faster dbscan algorithm based on self-adaptive determination of parameters, Procedia Comput. Sci. 221 (2023), 113–120.
[15] J. Qian, Y. Zhou, X. Han, and Y. Wang, Mdbscan: A multi-density dbscan based on relative density, Neurocom[1]puting 576 (2024): 127329.
[16] J. Ravi and S. Kulkarni, Automatic generation of parameters in density-based spatial clustering., ICTACT J. Soft Comput. 12 (2022), no. 2.
[17] M.A. Sorkhi, E. Akbari, M. Rabbani, and H. Motameni, A dynamic density-based clustering method based on k-nearest neighbor, Knowledge Inf. Syst. 66 (2024), 3005–3031.
[18] A. Strehl and J. Ghosh, Cluster ensembles-A knowledge reuse framework for combining multiple partitions, J. Machine Learn. Res. 3 (2003), 583–617.
[19] P.M. Vaidya, An o(n logn) algorithm for the all-nearest-neighbors problem, Discrete Comput. Geom. 4 (1989), no. 2, 101–115.
[20] Y. Wang, J. Qian, M. Hassan, X. Zhang, T. Zhang, C. Yang, X. Zhou, and F. Jia, Density peak clustering algorithms: A review on the decade 2014–2023, Expert Syst. Appl. 238 (2023), 121860.
[21] Z. Wang, Z. Ye, Y. Du, Y. Mao, Y. Liu, Z. Wu, and J. Wang, Amd-dbscan: An adaptive multi-density dbscan for datasets of extremely variable density, IEEE 9th Int. Conf. Data Sci. Adv. Anal., IEEE, 2022, pp. 1–10.
[22] H. Yin, A. Aryani, S. Petrie, A. Nambissan, A. Astudillo, and S. Cao, A rapid review of clustering algorithms, arXiv preprint arXiv:2401.07389 (2024).
[23] X. Zhang and S. Zhou, Woa-dbscan: Application of whale optimization algorithm in dbscan parameter adaption, IEEE Access 11 (2023), 91861–91878.

Articles in Press, Corrected Proof
Available Online from 11 November 2024
  • Receive Date: 20 February 2024
  • Revise Date: 13 April 2024
  • Accept Date: 07 May 2024