Big data analysis by using one covariate at a time multiple testing (OCMT) method: Early school dropout in Iraq

Document Type : Research Paper

Authors

1 Department of Statistics, College of Administration and Economics Administration and Economics, Wasit University, Wasit, Iraq

2 Department of Statistics, College of Administration and Economics Administration and Economics, University of Baghdad, Baghdad, Iraq

Abstract

The early school dropout is very significant portents that controls the future of societies and determine the nature of its elements. Therefore, studying this phenomenon and find explanations of it is a necessary matter, by finding or developing appropriate models to predict it in the future. The variables that affect the early school dropout Iraq takes a large size and multiple sources and types due the political and economic situation , which attributes it as a sort of Big Data that must be explored by using new statistical approaches. The research aims at using one Covariate at a Time. Multiple Testing OCMT Method to analyze the data from surveys collected by the Central Statistical Organization IRAQ, which contains many indicators related to school dropout and meaningfully affect the life of the Iraqi persons. The Ridge Regression Method as well as the OCMT method were chosen to analyze data and the Mean Square Errors MSE was used to compare the two methods and From the results we find that OCMT estimator is better than Ridge estimator with Big Data conditions.

Keywords

[1] D. Acharjya, A. Kauser, A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools, International Journal of Advanced Computer Science and Applications. 2 (2016) 59-518.
[2] W. Chang and N. Grady, NIST Big Data Interoperability Framework, Volume 1, Definitions, Special Publication (NIST SP) - 1500-1 Version 2, 2012
[3] A. Chudik, G. Kapetanios and M. Pesaran, One-Covariate at Time, Multiple Testing Approach to variable selection in High-Dimensional Regression Models, Econometrica, 4 (2018) 1479-1512.
[4] A. Dorugade, New Ridge Parameters for Ridge Regression, Journal of the Arab Universities for Basic and Applied Statistics, 3 (2014) 94-99.
[5] J. Fan, H. Fang, Challenges of Big Data Analysis, National Science Review, 1 (2014) 293-314.
[6] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction , Springer Series in Statistics, USA, 2010.
[7] A. Hoerl and A. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems Technometrics, 1 (1970) 55-67.
[8] G. Kapetanios, M. Marcellino and K. Petrova, Analysis of the Most Recent Modeling Techniques for Big Data with Particular Attention to Bayesian Ones, Eurostat. Statistical working papers. ISBN 978-92-79-77350-1, 2018.
[9] J. Lv, Y. Fan, A Unified Approach to Model Selection and Sparse Recovery Using Regularized Least Squares, The Annals of Statistics, 4 (2009) 3498-3528.
[10] R. Tibshirani, Regression Shrinkage and Selection via the Lasso, Jornal of Royal Statistics Society. 1 (1996) 1456-1490. 
Volume 12, Issue 2
November 2021
Pages 931-938
  • Receive Date: 12 April 2021
  • Revise Date: 30 May 2021
  • Accept Date: 25 May 2021