Fraud usage detection in internet users based on log data

Document Type : Research Paper

Authors

Faculty of Computer Science and Mathematics, University of Kufa, Iraq.

Abstract

The Internet has become one of the most important daily social, financial and other activities. The number of customers who use the Internet to conduct their business and purchases is very large. This results in billions of dollars being transferred every day online. Such a large amount of money attracts the attention of cybercriminals to carry out their illegal activities. “Fraud” is one of the most dangerous of these methods, especially phishing, where attackers try to steal user credentials using fraudulent emails, fake websites, or both. The proposed system in this paper includes efficient data extraction from the web file through data collection and preprocessing. and web usage mining procedure to extract features that demonstrate user behavior. And feature-extracting URL analysis to detect website phishing addresses. After that, the features from the above two parts are combined to make the number of features sixty-three. Finally, a classification algorithm (Random Forests) is applied to determine if website addresses are phishing or legitimate. Suggested algorithms performance is determined by using a confusion matrix that shows the robustness of the proposed system.

Keywords

[1] F. Aburub and S. Alhawari, A new fast associative classification algorithm for detecting phishing websites, Appl.
Soft Comput. J. 48 (2016) 729–734.
[2] H. Abusaimeh and Y. Alshareef, Detecting the phishing website with the highest accuracy, TEM J. 10(2) (2021)
947–953.
[3] Anti-Phishing Working Group, Inc. (APWG), Phishing Activity Trends Reports, 1st Quarter 2021.
[4] E.S. Aung, C.T. Zan, and H. Yamana, A survey of URL-based phishing detection, DEIM Forum (2019) 1–8.
[5] M. Aydin and N. Baykal, Feature extraction and classification phishing websites based on URL, 2015 IEEE Conf.
Commun. Network Sec. CNS 2015 (2015) 769–770.
[6] E. Bhagyashree and K. Tanuja, Phishing URL detection: a machine learning and web mining-based approach,
Int. J. Comput. Appl. 123(13) (2015) 46–50.
[7] Classification: Precision and Recall — Machine Learning Crash Course, 2021.
[8] M.J. Hamid Mughal, Data mining: web data mining techniques, tools and algorithms: an overview, Int. J. Adv.
Comput. Sci. Appl. 9(6) (2018) 208–215.
[9] D.J. Hand, P. Christen, and N. Kirielle, F

: an interpretable transformation of the F-measure, Mach. Learn.
110(3) (2021) 451–456.
[10] S. Jagadeesan, URL phishing analysis using random forest, International Journal of Pure and Applied Mathematics, 118(20) (2018) 4159–4163.
[11] P. Kalaharsha, and B.M. Mehtre, Detecting phishing sites - an overview, arXiv preprint arXiv:2103.12739, (2021)
1–13.[12] R. Kumar, X. Zhang, H.A. Tariq, and R.U. Khan, Malicious URL detection using multi-layer filtering model,
2017 14th Int. Comput. Conf. Wavelet Active Media Tech. Inf. Proc. (2017) 97–100.
[13] Y. Li, Z. Yang, X. Chen, H. Yuan and W. Liu, A stacking model using URL and HTML features for phishing
webpage detection, Futur. Gener. Comput. Syst. 94 (2019) 27–39.
[14] H. Liu, X. Pan, and Z. Qu, Learning based malicious web sites detection using suspicious URLs, 34th Int. Conf.
Softw. Eng. (2016) 3–5.
[15] A. Mahalakshmi, N.S. Goud, and G.V. Murthy, A survey on phishing and it’s detection techniques based on
support vector method (Svm) and software defined networking (sdn), Int. J. Eng. Adv. Tech. 8(2) (2018) 498–503.
[16] S. Nandhini, and V. Vasanthi, Extraction of features and classification on phishing websites using web mining
techniques, Int. J. Engin. Dev. Res. 5(4) (2017) 1215–1225.
[17] I. Qabajeh, F. Thabtah, and F. Chiclana, A recent review of conventional vs. automated, Comput. Sci. Rev. 29
(2018) 44–55.
[18] R.S. Rao and A.R. Pais, Jail-Phish: An improved search engine based phishing detection system, Comput. Secur.
83 (2019) 246–267.
[19] S. Rawat, A. Srinivasan and R. Vinayakumar, Intrusion detection systems using classical machine learning techniques versus integrated unsupervised feature learning and deep neural network, arXiv:1910.01114v1, CoRR, (2019)
1–9.
[20] W. Rong, Z. Yan, T. Jiefan and Z. Binbin, Detection of malicious web pages based on hybrid analysis, J. Inf.
Secur. Appl. 35 (2017) 68–74.
[21] B.A. Tama and K. Rhee, A comparative study of phishing websites classification based on classifier ensembles, J.
Korea Mult. Soc. 5(2) (2018) 99–104.
[22] A.A. Ubing, S. Kamilia, B. Jasmi, A. Abdullah, N.Z. Jhanjhi and M. Supramaniam, Phishing website detection:
an improved accuracy through feature selection and ensemble learning, Int. J. Adv. Comput. Sci. Appl. 10(1)
(2019) 252–257.
[23] G. Varshney, M. Misra and P.K. Atrey, A survey and classification of web phishing detection schemes, Secur.
Commun. Networks, 9(18) (2016) 6266–6284.
[24] R. Verma and A. Das, What’s in a URL: Fast feature extraction and malicious URL detection, IWSPA 2017 -
Proc. 3rd ACM Int. Work. Sec. Priv. Anal. (2017).
Volume 12, Issue 2
November 2021
Pages 2179-2188
  • Receive Date: 02 February 2021
  • Revise Date: 15 May 2021
  • Accept Date: 11 June 2021