Predicting the number of comments on Facebook posts using an ensemble regression model


1 Assistant Professor, Department of Electrical Engineering, Shams Higher Education Institute, Iran.

2 Department of Computer Engineering, Faculty of Technical Engineering, Shams institute of Higher Education, Gonbad Kavous, Iran.

3 Assistant Professor of Biomedical Engineering, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran.

4 Assistant Professor, Aerospace Research Institute, Ministry of Science, Research and Technology, Iran.



The nature and importance of user’s comments in various social media systems play an important role in creating or changing people's perceptions of certain topics or popularizing them. It has now an important place in various fields, including education, sales, prediction, and so on. In this paper, Facebook social network has been considered as a case study. The purpose of this study is to predict the volume of Facebook users' comments on the published content called post. Therefore, the existing problem is classified as a regression problem.
In the method presented in this paper, three regression models called elastic network, M5P model, and radial basis function regression model are combined and an ensemble model is made to predict the volume of comments. In order to combine these base models, a strategy called stack generalization is used, based on which the output of the base models is provided to a linear regression model as new features. This linear regression model combines the outputs of the 3 base models and determines the final output of the system.
To evaluate the performance of the proposed model, a database of the UCI dataset, which has 5 training sets and 10 test sets, has been used. Each test set in this database has 100 records. In the present study, the efficiency of the base models and the proposed ensemble model is evaluated on all these sets. Finally, it is concluded that the use of the ensemble model can reduce the average correlation coefficient (as one of the evaluation criteria of the model) to 74.4 ± 16.4, which is an acceptable result.