Analysis of performance of accuracy by adding new features individually using Relief-F and Budget Tree Random Forest (RFBTRF) method

Document Type : Research Paper

Authors

1 Department of Information Technology, Kakatiya Institute of technology and science, Warangal- 506015, Telangana

2 Department of Computer Science and Engineering, Siddhartha Institute of Technology and Sciences, Narapally, Hyderabad, Telangana 500088

3 Department of Computer Science and Engineering, PSNA College of Engineering and Technology Kothandaraman Nagar, Dindigul-624622 TamilNadu, India.

Abstract

The education is very important for improving the values of students in the society. Different types of features like school related features, student related features, parent related features and teacher related features are influencing the success rate of students in their education. Identification of best features from the huge set of features for analyzing the success or failure of a student is one important challenge to the research community and academicians. The set of features information is collected for preparing the student dataset also one difficult task in the prediction of student academic performance. We collected a student dataset of different schools that contains 4965 student’s information. The dataset contains information of 45 features of different categories such as school related features, student related features, parent related features and teacher related features. All features are not useful for predicting the academic performance of a student. The Data mining methods are applied in various research domains including education to extract hidden information from datasets. The feature selection algorithms are used to determine the best informative features by eliminating the irrelevant and redundant features. In this work, Relief-F Budget Tree Random Forest feature selection algorithm is used to identify the relevant features in the collected school dataset. Five different machine learning models are used to predict the efficiency of feature selection algorithm. The decision tree model shows best accuracy for student academic performance prediction compared with other models. The experimental results display that the RFBTRF algorithm identifies the best informative features for enhancing the accuracy of student academic performance prediction and also reduces the over-fitting issues. The experiment started with individual features and then continued with combination of different categories of features. It was observed that the accuracy of student academic performance prediction is decreased when some categories of features are added to other categories of features.

Keywords

[1] F. Al-Obeidat, A. Tubaishat, A. Dillon and B. Shah, Analyzing students’ performance using multi-criteria classification,Cluster Comput.,21(1)(2018) 623-632.
[2] R. Asif, A. Merceron, S. A. Ali and N. G. Haider, Analyzing undergraduate students’ performance using educational data mining, Comput. Educ., 113 (2017) 177-194.
[3] A. Asselman, M. Khaldi and S. Aammou, Evaluating the impact of prior required scaffolding items on the improvement of student performance prediction , Educ. Inf. Technol., (2020) 1-23.
[4] C. Burgos, M. L. Campanario, D. de la Pe˜na, J. A. Lara, D. Lizcano and M. A. Mart´ınez, Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout, Comput. Electr. Eng. , 66. (2018)., pp. 541- 556.
[5] R. C. Chen, Using deep learning to predict user rating on imbalance classification data, IAENG Int. J. Comput. Sci., 46(2019) 109–17.
[6] E. B. Costa, B. Fonseca, M. A.Santana, F. F. de Ara´ujo and J. Rego, Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses, Comput. Hum. Behav., 73( 2017) 247-256.
[7] K. Deepika and N. Sathyanarayana, Relief-F and Budget Tree Random Forest based feature selection for student academic performance prediction , Int. J. Intell. Eng. Syst., 12 (1) (2019) 30-39.
[8] E. Fernandes, M. Holanda, M. Victorino, V. Borges, R. Carvalho, and G. V. Erven, Eduational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil, J. Bus. Res., 94 (2019) 335-343.
[9] H. Fujita, Neural-fuzzy with representative sets for prediction of student performance, Appl. Intell., 49(1) (2019) 172-187.
[10] S. Helal , J. Li, L. Liu, E. Ebrahimie, S. Dawson and D.J. Murray, Identifying key factors of student academic performance by subgroup discovery, Int. J. Data Sci. Anal. ,7(3)(2018) 227-245.
[11] J. K. Jaiswal and R. Samikannu, Application of random forest algorithm on feature subset selection and classification and regression, World Congr. Comput. Commun. Technol.(WCCCT), IEEE, 2017, p. 65–68.
[12] M. Li, C. Huang , D. Wang, Q. Hu, J. Zhu and Y. Tang, Improved randomized learning algorithms for imbalanced and noisy educational data classification, Computing, 101(6)(2019) 571-585.
[13] M. Pandey and S. Taruna, Towards the integration of multiple classifier pertaining to the Student’s performance prediction, Perspect. Sci., 8 (2016) 364-366.
[14] A. M. Radwan and Z. Cataltepe, Improving performance prediction on education data with noise and class imbalance, Intell. Autom. Soft Comput.,( 2017) 1-8.
[15] X. Wang, R. Chen, F. Yan, Z. Zeng and C. Hong, Fast adaptive K-means subspace clustering for high-dimensional data, IEEE Access, 7 (2019) 42639–42651.
[16] J. Xu, K. H. Moon and M. V. D. Schaar, A machine learning approach for tracking and predicting student performance in degree programs, IEEE J. Sel. Top. Signal Process. , 11 (5)(2017) 742-753 .
Volume 13, Issue 1
March 2022
Pages 1239-1252
  • Receive Date: 01 June 2021
  • Accept Date: 19 October 2021