The impact of diversity on clustering ensemble using $ Chi^2 $ criterion

Document Type : Research Paper

Authors

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

Abstract

Clustering ensemble is a technique for improving clustering results' robustness and accuracy. Basically, this technique generates base clusterings and then combines them into a consensus solution whose quality is determined by the diversity of the base clusterings and the consensus function's performance. In order to improve the quality of consensus solutions, it is necessary to generate base clusterings with regard to quality and diversity. Novel techniques were employed in this study to generate diverse base clusterings for both low-dimensional and high-dimensional datasets, as well as new criteria to compute the diversity of base clusterings with respect to quality. The impacts of different levels of diversity on consensus functions were studied. The proposed methods generated diverse base clusterings, according to the findings of the experiments.

Keywords

[1] N.M. Abdolrazzagh and M. Kherad, Improved birch clustering by chemical reaction optimization algorithm to health fraud detection, Iran. J. Electric. Comput. Engin. 17 (2019), no. 2, 153–160.
[2] E. Akbari, H. Mohamed Dahlan, R. Ibrahim, and H. Alizadeh, Hierarchical cluster ensemble selection, Engin. Appl. Artif. Intell. 39 (2015), 146–156.
[3] H. Alizadeh, B. Minaei-Bidgoli, and H. Parvin, Cluster ensemble selection based on a new cluster stability measure, Intell. Data Anal. 18 (2014), no. 3, 389–408.
[4] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci. 96 (1999), no. 12, 6745–6750.
[5] T. Alqurashi and W. Wang, Clustering ensemble method, Int. J. Machine Learn. Cyber. 10 (2019), no. 6, 1227–1246.
[6] J. Azimi and X. Fern, Adaptive cluster ensemble selection, Proc. 21st Int. Jont Conf. Artif. Intell., 2009, pp. 992–997.
[7] A. Bagherinia, B. Minaei-Bidgoli, M. Hosseinzadeh, and H. Parvin, Reliability-based fuzzy clustering ensemble, Fuzzy Sets Syst. 413 (2021), 1–28.
[8] V. Berikov, Weighted ensemble of algorithms for complex data clustering, Pattern Recog. Lett. 38 (2014), 99–106.
[9] H. Chang and D.-Y. Yeung, Robust path-based spectral clustering, Pattern Recog. 41 (2008), no. 1, 191–203.
[10] D. Dua and C. Graff, Uci machine learning repository [http://archive.ics.uci. edu/ml]. irvine, ca: University of California, School Inf. Comput. Sci. 25 (2019), 27.
[11] X.Z. Fern and C.E. Brodley, Cluster ensembles for high dimensional clustering: An empirical study, (2006).
[12] A. Fiori, A. Mignone, and G. Rospo, Decoclu: Density consensus clustering approach for public transport data, Inf. Sci. 328 (2016), 378–388.
[13] W.A. Freije, F.E. Castro-Vargas, Z. Fang, S. Horvath, T. Cloughesy, L.M. Liau, P.S. Mischel, and S.F. Nelson, Gene expression profiling of gliomas strongly predicts survival, Cancer Res. 64 (2004), no. 18, 6503–6510.
[14] L. Fu and E. Medico, Flame, a novel fuzzy clustering method for the analysis of DNA microarray data, BMC Bioinf. 8 (2007), no. 1, 1–15.
[15] G. Gan, C. Ma, and J. Wu, Data clustering: Theory, algorithms, and applications, SIAM, 2020.
[16] A. Gionis, H. Mannila, and P. Tsaparas, Clustering aggregation, Acm Trans. Knowledge Discov. Data 1 (2007), no. 1, 4–es.
[17] K. Golalipour, E. Akbari, S.S. Hamidi, M. Lee, and R. Enayatifar, From clustering to clustering ensemble selection: A review, Engin. Appl. Artif. Intell. 104 (2021), 104388.
[18] J. Guan, R.-Y. Li, and J. Wang, Grace: A graph-based cluster ensemble approach for single-cell RNA-seq data clustering, IEEE Access 8 (2020), 166730–166741.
[19] S.T. Hadjitodorov, L.I. Kuncheva, and L.P. Todorova, Moderate diversity for better cluster ensembles, Inf. Fusion 7 (2006), no. 3, 264–275.
[20] S.S. Hamidi, E. Akbari, and H. Motameni, Consensus clustering algorithm based on the automatic partitioning similarity graph, Data Knowledge Engin. 124 (2019), 101754.
[21] E. Heidari, H. Motameni, and A. Movaghar, A meta-heuristic clustering method to reduce energy consumption in internet of things, Int. J. Nonlinear Anal. Appl. 12 (2021), no. 1, 45–58.
[22] H. Hooda and O.P. Verma, Fuzzy clustering using gravitational search algorithm for brain image segmentation, Multimedia Tools Appl. (2022), 1–20.
[23] D. Huang, J. Lai, and C.-D. Wang, Ensemble clustering using factor graph, Pattern Recog. 50 (2016), 131–142.
[24] N. Iam-On and T. Boongoen, Diversity-driven generation of link-based cluster ensemble and application to data classification, Expert Syst. Appl. 42 (2015), no. 21, 8259–8273.
[25] A.K. Jain, Data clustering: 50 years beyond k-means, Pattern Recog. Lett. 31 (2010), no. 8, 651–666.
[26] A.K. Jain and M.H.C. Law, Data clustering: A user’s dilemma, Int. Conf. Pattern Recog. Machine Intell., Springer, 2005, pp. 1–10.
[27] J. Jia, X. Xiao, B. Liu, and L. Jiao, Bagging-based spectral clustering ensemble selection, Pattern Recog. Lett. 32 (2011), no. 10, 1456–1467.
[28] I. Kononenko, Estimating attributes: Analysis and extensions of relief, Eur. Conf. Machine Learn., Springer, 1994, pp. 171–182.
[29] G. Li, M.R. Mahmoudi, S.N. Qasem, B.A. Tuan, and K.-H. Pho, Cluster ensemble of valid small clusters, J. Intell. Fuzzy Syst. 39 (2020), no. 1, 525–542.
[30] X. Li, Y. Zhang, H. Cheng, F. Zhou, and B. Yin, An unsupervised ensemble clustering approach for the analysis of student behavioral patterns, IEEE Access 9 (2021), 7076–7091.
[31] B.P. Marques and C.F. Alves, Using clustering ensemble to identify banking business models, Intell. Syst. Account. Finance Manag. 27 (2020), no. 2, 66–94.
[32] B. Minaei-Bidgoli, H. Parvin, H. Alinejad-Rokny, H. Alizadeh, and W.F. Punch, Effects of resampling method and adaptation on clustering ensemble efficacy, Artif. Intell. Rev. 41 (2014), no. 1, 27–48.
[33] E. Mueller, J.S.O. Sandoval, S. Mudigonda, and M. Elliott, A cluster-based machine learning ensemble approach for geospatial data: Estimation of health insurance status in missouri, ISPRS Int. J. Geo-Inf. 8 (2019), no. 1, 13.
[34] F. Najafi, H. Parvin, K. Mirzaie, S. Nejatian, and V. Rezaie, Dependability-based cluster weighting in clustering ensemble, Statist. Anal. Data Min. ASA Data Sci. J. 13 (2020), no. 2, 151–164.
[35] A. Nazari, A. Dehghan, S. Nejatian, V. Rezaie, and H. Parvin, A comprehensive study of clustering ensemble weighting based on cluster quality and diversity, Pattern Anal. Appl. 22 (2019), no. 1, 133–145.
[36] H. Niu, N. Khozouie, H. Parvin, H. Alinejad-Rokny, A. Beheshti, and M.R. Mahmoudi, An ensemble of locally reliable cluster solutions, Appl. Sci. 10 (2020), no. 5, 1891.
[37] P. Panwong, T. Boongoen, and N. Iam-On, Improving consensus clustering with noise-induced ensemble generation, Expert Syst. Appl. 146 (2020), 113138.
[38] M. Pividori, G. Stegmayer, and D.H. Milone, Diversity control for improving the analysis of consensus clustering, Information Sciences 361 (2016), 120–134.
[39] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, and C. Lau, Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature 415 (2002), no. 6870, 436–442.
[40] E. Rashedi and A. Mirzaei, A hierarchical clusterer ensemble method based on boosting theory, Knowledge-Based Syst. 45 (2013), 83–93.
[41] F. Rashidi, S. Nejatian, H. Parvin, and V. Rezaie, Diversity-based cluster weighting in cluster ensemble: an information theory approach, Artif. Intell. Rev. 52 (2019), no. 2, 1341–1368.
[42] M. Robnik-Sikonja and I. Kononenko, ˇ Theoretical and empirical analysis of relieff and rrelieff, Machine Learn. 53 (2003), no. 1, 23–69.
[43] A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. P. Patel, A. Tiwari, M.J. Er, W. Ding, and C.-T. Lin, A review of clustering techniques and developments, Neurocomput. 267 (2017), 664–681.
[44] R.I. Seetan, J. Bible, M. Karavias, W. Seitan, and S. Thangiah, Consensus clustering: A resampling-based method for building radiation hybrid maps, 15th IEEE Int. Conf. Machine Learn. Appl. (ICMLA), IEEE, 2016, pp. 240–245.
[45] Y. Shi, Z. Yu, W. Cao, C.L.P. Chen, H.-S. Wong, and G. Han, Fast and effective active clustering ensemble based on density peak, IEEE Trans. Neural Networks Learn. Syst. 32 (2020), no. 8, 3593–3607.
[46] A. Spira, J.E. Beane, V. Shah, K. Steiling, G. Liu, F. Schembri, S. Gilman, Y.-M. Dumas, P. Calner, and P. Sebastiani, Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer, Nature Med. 13 (2007), no. 3, 361–366.
[47] R. Srivastava, P. Singh, K.P.S. Rana, and V. Kumar, A topic modeled unsupervised approach to single documentextractive text summarization, Knowledge-Based Syst. (2022), 108636.
[48] A. Strehl and J. Ghosh, Cluster ensembles—a knowledge reuse framework for combining multiple partitions, J. Machine Learn. Res. 3 (2003), no. Dec, 583–617.
[49] D. Sun, K. Yang, and Z. Ding, Confidence-based simple graph convolutional networks for face clustering, IEEE Access 10 (2022), 6459–6469.
[50] A. Topchy, A.K. Jain, and W. Punch, A mixture model for clustering ensembles, Proc. 2004 SIAM Int. Conf. Data Mining, SIAM, 2004, pp. 379–390.
[51] Z. Wang, H. Parvin, S.N. Qasem, B.A. Tuan, and K.-H. Pho, Cluster ensemble selection using balanced normalized mutual information, J. Intell. Fuzzy Syst. 39 (2020), no. 3, 3033–3055.
[52] F. Yang, X. Li, Q. Li, and T. Li, Exploring the diversity in cluster ensemble generation: Random sampling and random projection, Expert Syst. Appl. 41 (2014), no. 10, 4844–4866.
[53] X.-S. Yang, Nature-inspired algorithms and applied optimization, vol. 744, Springer, 2017.
[54] M. Ye, W. Liu, J. Wei, and X. Hu, Fuzzy c-means and cluster ensemble with random projection for big data clustering, Math. Prob. Engin. 2016 (2016).
[55] C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. 100 (1971), no. 1, 68–86.
[56] H. Zarzour, F. Maazouzi, M. Al-Zinati, Y. Jararweh, and T. Baker, An efficient recommender system based on collaborative filtering recommendation and cluster ensemble, Eighth Int. Conf. Soc. Network Anal. Manag. Secur. (SNAMS), IEEE, 2021, pp. 01–06.
[57] Q. Zou, G. Lin, X. Jiang, X. Liu, and X. Zeng, Sequence clustering in bioinformatics: an empirical study, Brief. Bioinf. 21 (2020), no. 1, 1–10.
Volume 13, Issue 2
July 2022
Pages 1151-1163
  • Receive Date: 15 August 2021
  • Revise Date: 17 December 2021
  • Accept Date: 07 March 2022