Correlation estimation between samples based on covariance, graph theory and graph neural network

Document Type : Research Paper


Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran



One of the standard criteria for expressing the relationship between two random variables is the correlation coefficient. Correlation between variables shows that changing the value of one variable leads to changing another variable in a certain direction. It is also possible to use the value of one variable to predict the value of another. In statistics, the correlation coefficient measures the direction and strength of the tendency to change. In machine learning, the correlation coefficient is known as a measure of classification quality. In fact, as a starting step for classification, the correlation between different samples should be estimated using a specific method. There are various methods to estimate the correlation of different data types, which have disadvantages such as low accuracy or high computational time. One of the methods that can overcome these problems, due to its high capability in modeling correlation between samples is graphical modeling. In this research, a new covariance model based on graph theory and graph neural network for estimating the correlation between samples is presented. The results show the improvement of the proposed model in accuracy, sensitivity, precision, F-Micro, F-Macro and statistical tests compared to Pearson and cosine methods.


[1] H. Akoglu, User’s guide to correlation coefficients, Turk. J. Emergency Med. 18 (2018), no. 3, 91–93.
[2] Y. Al-Sbou, Minkowski distance as a quality of service assessment tool, Preprint.
[3] R.C. Amorim and B. Mirkin, Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering, Pattern Recog. 45 (2012), no. 3, 1061–1075.
[4] A.V. Arkhangel’skii and L.S. Pontryagin, General Topology I: Basic Concepts and Constructions Dimension Theory, Encyclopaedia of Mathematical Sciences, 1990.
[5] C. Borrego, E. Hemandez-Orallo, P. Manzoni, and A.M. Vegni, LAPSE: A machine learning message forwarding approach based on node centrality estimation in sparse dynamic networks, Wireless Days (WD). IEEE, 2021, pp. 1–6.
[6] U. Brandes, On variants of shortest-path between centrality and their generic computation, Soc. Networks 30 (2008), no. 2, 136–145.
[7] H.B. Colakoglu, A generalization of the Minkowski distance and a new definition of the ellipse, Turk. J. Math. 44 (2020), no. 1, 319–333.
[8] M.C. Delfour, Topological derivative: A semidifferential via the Minkowski content, J. Convex Anal. 25 (2018), no. 3, 957–982.
[9] F. Errica, M. Podda, D. Bacciu, and A. Micheli, A fair comparison of graph neural networks for graph classification, arXiv preprint arXiv:1912.09893 (2019).
[10] M. Girvan and M.E. Newman, Community structure in social and biological networks, Proc. Nat. Acad. Sci. 99 (2002), no. 12, 7821-7826.
[11] L. Goodwin, D. Leech, and L. Nancy, Understanding Correlation: Factors that Affect the Size of r, J. Exper. Educ. 74 (2006), no. 3, 251–266.
[12] M. Goswami, A. Babu, and B.S Purkayastha, A comparative analysis of similarity measures to find coherent documents, Int. J. Manag. 8 (2018), no. 11, 2249-7455.
[13] S. Gultom, S. Sriadhi, M. Martiano, and J. Simarmata, Comparison analysis of K-means and K-Medoid with Ecluidience Distance Algorithm, Canberra Distance, and Chebyshev Distance for big data clustering, IOP Conf. Ser.: Mater. Sci. Engin., vol. 420, 2nd Nommensen International Conference on Technology and Engineering, 2018, pp. 19–20.
[14] M. Hanafy and R. Ming, Classification of the insureds using integrated machine learning algorithms: A comparative study, Appl. Artific. Intell. 36 (2022), no. 1, 2020489.
[15] W. Inariba, T. Akiba, and Y. Yoshida, Random-radius ball method for estimating Closeness centrality, Proc. AAAI Conf. Artific. Intell., 2017.
[16] Y. Jin, Q. Bao, and Z. Zhang, Forest distance closeness centrality in disconnected graphs, IEEE Int. Conf. Data Min. (ICDM), 2019, pp. 339–348.
[17] H. Kalhori, M.M. Alamdari, and L. Ye, Automated algorithm for impact force identification using cosine similarity searching, Measurement 122 (2018), 648-657.
[18] Kent State University, SPSS Tutorials: Pearson Correlation, Available:
[19] J.M. List, Beyond edit distances: Comparing linguistic reconstruction systems, Theor. Linguist. 45 (2019), no. 3-4, 247-258.
[20] C. Liu, F. Zhu, X. Chang, X. Liang, Z. Ge, and Yi-Dong Shen, Vision-language navigation with random environmental mixup, Proc. IEEE/CVF Int. Conf. Comput. Vision, 2021, pp. 1644–1654.
[21] H. Mark and J. WorkmanJr, Chemometrics in Spectroscopy, Second Edition, Elsevier, 2018.
[22] S.K. Maurya and X. Liu, Tsuyoshi Murata, graph neural networks for fast node ranking approximation, ACM Trans. Knowledge Discov. Data 1 (2021), 1—32.
[23] R. Pascual-Marqui, D. Lehmann, K. Kochi, T. Kinoshita, and N. Yamada, A measure of association between vectors based on similarity covariance, arXiv preprint arXiv:1301.4291 (2013).
[24] M. Pervaiz, A. Jalal, and K. Kim, Hybrid algorithm for multi people counting and tracking for smart surveillance, Int. Bhurban Conf. Appli. Sci. Technol., 2021, pp. 530–535.
[25] M. Pintor, D. Angioni, A. Sotgiu, L. Demetrio, A. Demontis, B. Biggio, and F. Roli, ImageNet-Patch: A dataset for benchmarking machine learning robustness against adversarial patches, Pattern Recog. 134 (2023), 109064.
[26] A. Raj and S. Susan, Clustering Analysis for Newsgroup Classification, Data Engineering and Intelligent Computing, Lecture Notes in Networks and Systems, 2022.
[27] A. Saxena, R. Gera, and S.R.S Iyengar, A faster method to estimate closeness centrality ranking, arXiv preprint arXiv:1706.02083 (2017).
[28] J.R. Taylor, An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, Sausalito, CA: University Science Books, 1997.
[29] K. Thirumoorthy and K. Muneeswaran, Feature selection for text classification using machine learning approaches, Nat. Acad. Sci. Lett. 45 (2022), 51—56.
[30] A. van der Grinten, E. Angriman, M. Predari, and H. Meyerhenke, New approximation algorithms for forest closeness centrality–for individual vertices and vertex groups, Proc. SIAM Int. Conf. Data Min. (SDM), Soc. Ind. Appl. Math., 2021, pp. 136–144.
[31] A. Verm and V. Ranga, Machine learning-based intrusion detection systems for IoT applications, Wireless Pers Commun. 11 (2020), 2287–2310.
[32] S. Zhang and X. Pan, A novel text classification based on Mahalanobis distance, Int. Conf. Comput. Res. Dev., 2011, pp. 156–158.
[33] K. Zhao, Y. Dai, Z. Jia, and Y. Ji, General fuzzy C-means clustering algorithm using Minkowski metric, Signal Process. 188 (2021), 108161.
[34] Q. Zhou, P. Tang, S. Liu, J. Pan, Q. Yan, and S.-C Zhan, Learning atoms for materials discovery, Proc. Nat. Acad. Sci. 115 (2018), no. 28, 6411–6417.

Articles in Press, Corrected Proof
Available Online from 18 January 2024
  • Receive Date: 09 November 2022
  • Revise Date: 19 February 2023
  • Accept Date: 22 May 2023