Using Hadoop to analyze big data for multiple purposes: An applied study according to the Map-Reduce model

Document Type : Research Paper

Authors

1 Department of Computer Science, Faculty of Science, Zakho University, Duhok, Kurdistan Region, Iraq

2 Department of Economic, College of Economic and Administration, Duhok University, Duhok, Kurdistan Region, Iraq

Abstract

The volume and diversity of data in the world are unprecedented in human history. It is growing at an unprecedented rate. Internet and social media technologies as they permeate every stage of our lives and even our mobile phones, people have become a source of data even in their daily activities. So, a new concept emerged: "Big Data". Big data is produced with high volume, speed, structured diversity, and semi-structured and unstructured data. Many industrial areas release big data by creating new data or digitizing existing data models so that organizations can gain a competitive advantage. In order to extract economic value from big data, it should be processed with advanced analytical methods. This research aims to examine the use of Hadoop in analyzing big data according to the Map-Reduce model. and distributed file systems such as Processing, PIG, Mahout, NoSQL, and Cassandra, and the study concluded that advanced analytical methods protect the privacy of personal information, and through them, security gaps can also be filled, and the phenomenon of big data was discussed in terms of its components and resources, and it was emphasized on the advantages of big data in the areas of application.

Keywords

[1] T. Acarer, Opportunities and problems offered to software companies by the pandemic process, Eurasia Proc. Educ. Soc. Sci. 22 (2021), 18–25.
[2] S. Achsas, Improving relational aggregated search from big data sources using deep learning, Intelligent Systems and Computer Vision (ISCV), IEEE, 2017, pp. 1–6.
[3] J. Archenaa and E.M. Anita, A survey of big data analytics in healthcare and government, Procedia Comput. Sci. 50 (2015), 408–413.
[4] K. Bakshi, Considerations for big data: Architecture and approach, IEEE Aerospace Conf., IEEE, 2012, pp. 1–7.
[5] D. Boyd and K. Crawford, Six provocations for big data, A decade in internet time: Symposium on the dynamics of the internet and society, 2011.
[6] E.T. Bradlow, M. Gangwar, P. Kopalle, and S. Voleti, The role of big data and predictive analytics in retailing, J. Retail. 93 (2017), no. 1, 79–95.
[7] P. Chandarana and M Vijayalakshmi, Big data analytics frameworks, Int. Conf. Circuit. Syst. Commun. Inf. Technol. Appl. (CSCITA), IEEE, 2014, pp. 430–434.
[8] S. Chandra, S. Ray, and R.T. Goswami, Big data security: survey on frameworks and algorithms, IEEE 7th Int. Adv. Comput. Conf. (IACC), IEEE, 2017, pp. 48–54.
[9] S. Charney, Trustworthy computing next, Tech. report, Microsoft, 2012.
[10] M. Cox and D. Ellsworth, Application-controlled demand paging for out-of-core visualization, Proc. Visualization’97 (Cat. No. 97CB36155), IEEE, 1997, pp. 235–244.
[11] B. Cyganek, M. Gra˜na, B. Krawczyk, A. Kasprzak, P. Porwik, K. Walkowiak, and M. Wo´zniak, A survey of big data issues in electronic health record analysis, Appl. Artific. Intell. 30 (2016), no. 6, 497–520.
[12] J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. ACM 51 (2004), no. 1, 107–113.
[13] J. Debattista, C. Lange, S. Scerri, and S. Auer, Linked’big’data: towards a manifold increase in big data value and veracity, IEEE/ACM 2nd Int. Symp. Big Data Comput. (BDC), IEEE, 2015, pp. 92–98.
[14] F.X. Diebold, Big data dynamic factor models for macroeconomic measurement and forecasting, Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society,” (edited by M. Dewatripont, LP Hansen and S. Turnovsky), 2003, pp. 115–122.
[15] K. Fang, Y. Jiang, and M. Song, Customer profitability forecasting using big data analytics: A case study of the insurance industry, Comput. Ind. Engin. 101 (2016), 554–564.
[16] Y. Gahi, M. Guennoun, and H.T. Mouftah, Big data analytics: Security and privacy challenges, IEEE Symp. Comput. Commun. (ISCC), IEEE, 2016, pp. 952–957.
[17] A. Gandomi and M. Haider, Beyond the hype: Big data concepts, methods, and analytics, Int. J. Inf. Manag. 35 (2015), no. 2, 137–144.
[18] G. George, M.R. Haas, and A. Pentland, Big data and management, Acad. Manag. J. 57 (2014), no. 2, 321–326.
[19] B. Gerhardt, K. Griffin, and R. Klemann, Unlocking value in the fragmented world of big data analytics, Tech.report, Cisco Internet Business Solutions Group, 2012.
[20] S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system, Proc. Nineteenth ACM Symp. Oper. Syst. Principles, ACM, 2003, pp. 29–43.
[21] P.B. Goes, Editor’s comments: Big data and IS research, MIS Quart. 38 (2014), no. 3.
[22] N. Golov and L. R¨onnb¨ack, Big data normalization for massively parallel processing databases, Comput. Standards Interfac. 54 (2017), 86–93.
[23] P. Groves, B. Kayyali, D. Knott, and S.V. Kuiken, The’big data’revolution in healthcare: Accelerating value and innovation, Res. Brief 7 (2016), 1–11.
[24] O. Hamami, Big data security: Understanding the risks, Bus. Intell. J. 19 (2014), no. 2, 20–26.
[25] A. Katal, M. Wazid, and R.H. Goudar, Big data: issues, challenges, tools and good practices, Sixth Int. Conf. Contempo. Comput. (IC3), IEEE, 2013, pp. 404–409.
[26] E.W. Kuiler, From big data to knowledge: An ontological approach to big data analytics, Rev. Policy Res. 31 (2014), no. 4, 311–318.
[27] J. Leskovec, A. Rajaraman, and J.D. Ullman, Mining of massive data sets, Cambridge university press, 2014.
[28] S. Manca, L. Caviglione, and J. Raffaghelli, Big data for social media learning analytics: potentials and challenges, J. e Learn. Knowledge Soc. 12 (2016), no. 2.
[29] B. Mantha, Five guiding principles for realizing the promise of big data, Bus. Intell. J. 19 (2014), no. 1, 8–11.
[30] C.L. McNeely and J.O. Hahm, The big (data) bang: Policy, prospects, and challenges, Rev. Policy Res. 31 (2014), no. 4, 304–310.
[31] S.J. Miah, H.Q. Vu, J. Gammack, and M. McGrath, A big data analytics method for tourist behaviour analysis, Inf. Manag. 54 (2017), no. 6, 771–785.
[32] M. Minelli, M. Chambers, and A. Dhiraj, Big data, big analytics: Emerging business intelligence and analytic trends for today’s businesses, vol. 578, John Wiley & Sons, 2013.
[33] K. Naik and A. Joshi, Role of big data in various sectors, Int. Conf. I-SMAC (IoT in Social, Mobile, Analyticsand Cloud (I-SMAC), IEEE, 2017, pp. 117–122.
[34] R. Narasimhan and T. Bhuvaneshwari, Big data-a brief study, Int. J. Sci. Eng. Res. 5 (2014), no. 9, 350–353.
[35] F.J. Ohlhorst, Big data analytics: Turning big data into big money, vol. 65, John Wiley & Sons, 2012.
[36] Neil M Richards and Jonathan H King, Big data ethics, Wake For. Law Rev 49 (2014), no. 393, e432.
[37] Philip Russom et al., Big data analytics, TDWI best practices report, fourth quarter 19 (2011), no. 4, 1–34.
[38] S. Sagiroglu and D. Sinanc, Big data: A review, Int. Conf. Collaborat. Technol. Syst. (CTS), IEEE, 2013, pp. 42– 47.
[39] S.A. Schneider, ‘big data:’big challenge, big opportunity, 2012.
[40] K. Setty and R. Bakhshi, What is big data and what does it have to do with it audit, ISACA J. 3 (2013), no. 14, 1–3.
[41] G.M. Siddesh, S. Hiriyannaiah, and K.G. Srinivasa, Driving big data with hadoop technologies, Handbook of Research on Cloud Infrastructures for Big Data Analytics, IGI Global, 2014, pp. 232–262.
[42] M. Smith, C. Szongott, B. Henne, and G. Von Voigt, Big data privacy issues in public social media, 6th IEEE
Int. Conf. Digital Ecosyst. Technol.(DEST), IEEE, June 2012, pp. 1–6.
[43] U. Srivastava and S. Gopalkrishnan, Impact of big data analytics on banking sector: Learning for indian banks, Procedia Comput. Sci. 50 (2015), 643–652.
[44] H. Sun and P. Heller, Oracle information architecture: An architect’s guide to big data, Oracle, Redwood Shores, 2012.
[45] J.J. Tang and K.E. Karim, Big data in business analytics: Implications for the audit profession, The CPA journal 87 (2017), no. 6, 34–39.
[46] J. Wan, S. Tang, D. Li, S. Wang, C. Liu, H. Abbas, and A.V. Vasilakos, A manufacturing big data solution for active preventive maintenance, IEEE Trans. Ind. Inf. 13 (2017), no. 4, 2039–2047.
[47] S. Yu, D. Yang, and X. Feng, A big data analysis method for online education, 10th Int. Conf. Intell. Comput. Technol.d Automat. (ICICTA), IEEE, October 2017, pp. 291–294.
[48] R. Zafar, E. Yafi, M.F. Zuhairi, and H. Dao, Big data: The NoSQL and RDBMS review, Int. Conf. Inf. Commun. Technol. (ICICTM), IEEE, May 2016, pp. 120–126.
[49] N.Z. Zainal, H. Hussin, and M.N.M. Nazri, Big data initiatives by governments–issues and challenges: A review, 6th Int. Conf. Inf. Commun. technol. Muslim World (ICT4M), IEEE, November 2016, pp. 304–309.
[50] J. Zeyu, Y. Shuiping, Z. Mingduan, C. Yongqiang, and L. Yi, Model study for intelligent transportation system with big data, Procedia Comput. Sci. 107 (2017), 418–426.
[51] K. Zhou, C. Fu, and S. Yang, Big data driven smart energy management: From big data to big insights, Renew. Sustain. Energy Rev. 56 (2016), 215–225.
Volume 14, Issue 3
March 2023
Pages 47-62
  • Receive Date: 21 November 2022
  • Revise Date: 13 January 2023
  • Accept Date: 22 January 2023