Social media based digital file size estimation method using sampling technique with $\alpha$ control chart in big data

Document Type : Research Paper

Authors

1 Department of Computer Science and Applications, Dr. Harisingh Gour Vishwavidyalaya (M.P.), India

2 Department of Mathematics and Statistics, Dr. Harisingh Gour Vishwavidyalaya (M.P.), India

Abstract

Due to the emergence of social networking platforms, a large number of users around the world are being part and partial of this platform. At a fraction of the time users on social media are communicating digital files in the form of text, video, images, voice and music which ultimately generates big data. The matter of interest is to estimate precisely the average file size at time duration (occasion). The time may hours or days or months. This paper presents a sample-based methodology to deal with mean size estimation of digital communication content spreading on a social media platform. An estimator is suggested using a random sample from big data and its properties are derived. A simulation method is suggested that computes the confidence interval (CI) for the prediction of précised range of digital file size. The proposed method produces an optimal confidence interval at the suitable choice of constant. These estimated confidence intervals can be used for developing $\alpha$-control charts for constant monitoring of the growth in file size in social media storage at the data centre. If the growth of mean digital file size crosses the upper limit then additional storage infrastructure is needed at the administration level of the social media site. One can generate machine learning algorithms proposed method for monitoring the growth of average digital file size over time duration.

Keywords

[1] A. Abdul, and S. Diwakar, A study on sample-based parameter estimation techniques in big data analytics environment, Proc. Adapt. Learn. Optim., Vol. 13, Springer, Cham, 2020, pp. 237–248.
[2] A. Abdul, and S. Diwakar, Sampling-based estimation method for parameter estimation in big data business era, J. Adv. Manag. Res. 18 (2020), no. 2, 297–322.
[3] A. Abdul and S. Diwakar, Double sampling based parameter estimation in big data and application in control charts, Reliab.Theory Appl. 16 (2021), no. 2, 72–144.
[4] H. Abid, J. Mohd Khan, I. Haleem and V., Raju, Significant applications of big data in COVID-19 pandemic, Indian J. Orthopaed. 54 (2020), no. 4, 526–528.
[5] S. Diwakar, F-T estimator under two-phase sampling, METRON. 59 (2002), 110-122.
[6] A. Fatima Binta, H. Adib, H. Suhaidi, C. Les, W. Bebo, and A. Ibrahim, A survey on big data indexing strategies, Proc. 4th Int. Conf. Internet Appl. Protocols and Services, 2015, pp. 13–18.
[7] J. Feng, R. Seungmin, C. Bo-Wei, L. Kun, and Z. Debin, Big data-driven decision making and multi-prior models collaboration for media restoration, Multimedia Tools Appl. 75 (2016), no. 20, 12967–12982.
[8] S. Gaurav and P. Deoraj, Control chart applications in healthcare: A literature review, Int. J. Metrol. Qual. Engin. 9 (2018), no. 5, 1–21.
[9] I. Giangreco, A.I. Kabary, and H. Schuldt, ADAM - A database and information retrieval system for big multimedia collections, IEEE Int. Cong. Big Data, Anchorage, AK, 2014, pp. 406–413.
[10] K. Ioannis, D. Sotiris and S. Papadopoulos, Social data and multimedia analytics for news and events applications, Proc. EDBT/ICDT 2014 Joint Conf., Greece, March 28, 2014.
[11] L. Jian, Multimedia big data frame combination storage strategy based on virtual space distortion, Int. J. Online Biomed. Engin. 13 (2017), no. 2, 119–130.
[12] W. Jun, W. Jian, N. Stephen, M. Elizabeth, and F. Qiuyan, Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations, J. Med. Internet Res. 22 (2020), no. 10, 1–16.
[13] S. Jun, X. Zongben, and M. Deyu, Small sample learning in big data era, arXiv preprint arXiv:1808.04572, 2018
[14] C. Kasturi and C. Shu-Ching, A novel indexing and access mechanism using affinity hybrid tree for content-based image retrieval in multimedia databases, Int. J. Semantic Comput. 1 (2007), no. 2, 147-–170.
[15] G. Kehua, P.Wei, L. Mingming, Z. Xiaoke, and M. Jianhua, An effective and economical architecture for semanticbased heterogeneous multimedia big data retrieval, J. Syst. Software 102 (2015), no. C, 207–216.
[16] J.K. Kim and Z. Wang, Sampling techniques for big data analysis, Int. Statist. Rev. 87 (2019), no. S1, S177–S191.
[17] D. Mera, M. Batko, and P. Zezula, Speeding up the multimedia feature extraction: A comparative study on the big data approach, Multimedia Tools Appl. 76 (2017), 7497–7517.
[18] D.C. Montgomery, Introduction to Statistical Quality Control, Ed 4, John Wiley & Sons, 2001.
[19] Q. Peihua, Statistical process control charts as a tool for analysing big data, Big Data Complex Analysis, Springer Cham, 2017, pp. 123–138.
[20] C.A. Pi˜na-Garcıa, C. Gershenson, and J.M. Siqueiros-Garcıa, Towards a standard sampling methodology on online social networks: Collecting global trends on twitter, Appl. Netw. Sci. 1 (2016), no. 3, 1–19.
[21] P. Qiu, Big data? Statistical process control can help!, Amer. Statist. 74 (2020), no. 4, 329 344.
[22] P. Samira, Y. Yimin, C. Shu-Ching, S. Mei-Ling, and S.S. Iyengar, Multimedia big data analytics: A survey, ACM Comput. Survey 51 (2018), no. 1, 1–34.
[23] A. Samuel, M. Sarfraz, I. Haseeb, H. Basalamah, and A. Ghafoor, A framework for composition and enforcement of privacy-aware and context-driven authorization mechanism for multimedia big data, IEEE Trans. Multimedia 17 (2015), no. 9, 1484–1494.
[24] S. Sarjinder, Advanced Sampling Theory with Applications, Kluwer Academic Publishers, Springer, Dordrecht, 2003.
[25] U. Sivarajah, M. Mustafa Kamal, Z. Irani and V. Weerakkody, Critical analysis of big data challenges and analytical methods, J. Bus. Res. 70 (2017), 263–286.
[26] P.V. Sukhatme and B.V. Sukhatme, Sampling Theory and Surveys with Applications, Asia Publishing House, New Delhi, 1970.
[27] C.G. William, Sampling Techniques, John Wiley & Sons, USA, 2005.
[28] W.D. Xie and X. Cheng, Imbalanced big data classification based on virtual reality in cloud computing, Multimedia Tools Appl. 79 (2020), 16403-–16420.
[29] L. Zhicheng and Z. Aoqian, A survey on sampling and profiling over big data (Technical Report)., ArXiv, abs/2005.05079 (2018), 1-17.
[30] L. Zhicheng and Z. Aoqian, A Survey on sampling and profiling over big data, Technical Report. arXiv preprintarXiv:2005.05079, 2020.
Volume 15, Issue 9
September 2024
Pages 389-411
  • Receive Date: 13 August 2022
  • Revise Date: 07 March 2023
  • Accept Date: 11 August 2023