Development of FCM method to increase clustering accuracy in big data

Document Type : Research Paper

Authors

1 Department of Information and Communication Technology Management, Qeshm Branch, Islamic Azad University, Qeshm, Iran

2 Department of Industrial Management, Tehran Branch, Islamic Azad University, Tehran, Iran

3 Department of Industrial Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

Abstract

Due to the spread of the Internet and its pervasiveness, ``big data" is created daily. Processing this amount of data requires a system with high processing power. In fact, the production and collection of data from a wide range of different equipment and tools lead to the creation of large-scale databases. In dealing with large and unstructured databases and their management, there are always challenges. This study aims to present a model to increase the clustering accuracy of big data using a fuzzy clustering system based on data mining in a MatLab programming environment. For this purpose, first, the importance of each variable in the decision tree models in SPSSModeler software is determined, then with the help of these results, fuzzy rules are explained and a fuzzy inference system is formed in MATLAB software. This study uses data mining techniques such as C\&R Tree, Chaid and C5.0 to study the development of the FCM method to increase clustering accuracy in high volume data and related factors such as data preparation indicators, data type Data quality, data dimensions, data volume and number of clusters were evaluated as inputs and clustering accuracy index was evaluated as output. Then, with the help of these results, the rules of forming a fuzzy inference system were determined and by explaining the membership functions of the decision model, it showed what effect each input index has on the output index.

Keywords

[1] T. Ahmed, Z. Xiaofei, Z. Wang and P. Gong, Rectangular array of electromagnetic vector sensors: tensor modelling/decomposition and DOA-polarisation estimation, IET Signal Process. 13 (2019), no. 7, 689–699.
[2] D. Angrave, A. Charlwood, I. Kirkpatrick, M. Lawrence and M.H.R. Stuart, analytics: why HR is set to fail the big data challenge, Human Resource Manag. J. 26 (2016), no. 1, 1–11.
[3] M.R. Bendre and V.R. Thool, Analytics, challenges and applications in big data environment: a survey, J. Manag. Anal. 3 (2016), no. 3, 206–239.
[4] S. E. Bibri, The IoT for smart sustainable cities of the future: An analytical framework for sensor-based big data applications for environmental sustainability, Sustain. Cit. Soc. 38 (2018), 230–253.
[5] F. Bu, An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT, Future Gen. Comput. Syst. 88 (2018), 675–682.
[6] J. Dekhtiar, A. Durupt, M. Bricogne, B. Eynard, H Rowson and D. Kiritsis, Deep learning for big data applications in CAD and PLM–Research review, opportunities and case study, Comput. Ind. 100 (2018), 227–243.
[7] M. Hajeer and D. Dasgupta, Handling big data using a data-aware HDFS and evolutionary clustering technique, IEEE Trans. Big Data 5 (2017), no. 2, 134-147.
[8] B. Jan, H. Farman, M. Khan, M. Imran, I. U. Islam, A. Ahmad, . . . and G. Jeon, Deep learning in big data Analytics: A comparative study, Comput. Electric. Engin. 75 (2019), 275–287.
[9] J. Li, Z. Lu, W. Zhang, J. Wu, H. Qiang, B. Li and P.C. Hung, SERAC3: Smart and economical resource allocation for big data clusters in community clouds, Future Gen. Comput. Syst. 85 (2018), 210–221.
[10] D. Liu, L. Ma, X. Liu, H. Yu, H. Tan, X. Zhao, Y. Zhao and G. Lv, Research on key issues of data integration
technology in electric power system in big data environment, IEEE 9th Int. Conf. Commun. Software Networks, 2017, pp. 1368–1372.
[11] Z. Qingchen, T.Y. Laurence, C. Zhikui and L. Peng, A survey on deep learning for big data, Inf. Fusion 42 (2018), 146–157.
[12] M.J. Rezaee, M. Jozmaleki and M. Valipour, Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange, Phys. A: Statist. Mech. Appl. 489 (2018), 78–93.
[13] N. Sajadfar and Y. Ma, A hybrid cost estimation framework based on feature-oriented data mining approach, Adv. Engin. Inf. 29 (2015), no. 3, 633-647.
[14] G. Suciu, V. Suciu, A. Martian, R. Craciunescu, A. Vulpe, I. Marcu, Simona Halunga and O. Fratu, Big data, internet of things and cloud convergence–an architecture for secure e-health applications, J. Med. Syst. 39 (2015), no. 11, 1–8.
[15] S.F. Wamba, S. Akter, A. Edwards, G. Chopin and D. Gnanzou, How ‘big data’can make big impact: Findings from a systematic review and a longitudinal case study, Int. J. Prod. Econ. 165 (2015), 234–246.
[16] Y. Yang, E.W. See-To and S. Papagiannidis, You have not been archiving emails for no reason! Using big data analytics to cluster B2B interest in products and services and link clusters to financial performance, Ind. Market. Manag. 86 (2020), 16–29.
[17] Q. Zhang, L.T. Yang, A. Castiglione, Z. Chen and P. Li, Secure weighted possibilistic c-means algorithm on cloud for clustering big data, Inf. Sci. 479 (2019) , 515–525.
Volume 14, Issue 8
August 2023
Pages 55-66
  • Receive Date: 11 October 2022
  • Revise Date: 22 October 2022
  • Accept Date: 05 December 2022