A review on video violence detection approaches

Shubber, Mohamed Safaa Mohamed; Al-Ta&#039;i, Ziyad Tariq Mustafa

doi:10.22075/ijnaa.2022.6369

A review on video violence detection approaches

Document Type : Research Paper

Authors

Department of Computer Science, College of Science, University of Diyala, Baqubah, Iraq

10.22075/ijnaa.2022.6369

Abstract

A violent behaviour detection system (VBDS) is an important application of intelligent video surveillance that performs a critical role in the field of public security and safety VBDS is a sort of behaviour recognition that seeks to determine whether the behaviours observed in the situation are violent, such as fighting or assault. This paper presents a survey of the existing approaches to VBDS. In this paper, the existing VBDS techniques are classified based on their framework, which includes the old-fashion framework and the end-to-end state-of-the-art deep learning framework. Finally, the VBDS methods' performance is assessed and compared.

Keywords

References

[1] A.R. Abdali, Data efficient video transformer for violence detection, IEEE Int. Conf. Commun. Networks Satell., 2021, p. 195–199.

[2] A.M.R. Abdali and R.F. Al-Tuma, Robust real-time violence detection in video using CNN snd LSTM, 2nd Sci. Conf. Comput. Sci. (SCCS), 2019, p. 104–108.

[3] S. Akti, G.A. Tataroglu and H.K. Ekenel, Vision-based fight detection from surveillance cameras, 9th Int. Conf. Image Process. Theory, Tools Appl., 2019, p. 1–6.

[4] N. Aldahoul, H.A. Karim, R. Datta, S. Gupta, K. Agrawal and A. Albunni, Convolutional neural network-long short term memory based IOT node for violence detection, IEEE Int. Conf. Artif. Intell. Eng. Tech. (IICAIET), 2021, p. 1–6.

[5] L. Alzubaidi, J. Zhang, A.J. Humaidi and A. Al-Dujaili, Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions, J. Big Data 8 (2021), no. 1.

[6] S.M.R. Ammar, M. Anjum, T. Rounak, M. Islam and T. Islam, Using deep learning algorithms to detect violent activities, Doctoral dissertation, BRAC University, 2019.

[7] R. Barmaki, A decision-theoretic generalization of online learning and an application to boosting, J. Comput. Syst. Sci. 55 (1996), no. 1, 119–139.

[8] A. Benali Amjoud and M. Amrouch, Convolutional neural networks backbones for object detection, Int. Conf. Image Signal Process., 2020, p. 282–289.

[9] A. Ben Mabrouk and E. Zagrouba, Spatio-temporal feature using optical flow based distribution for violence detection, Pattern Recog. Lett. 92 (2017), 62–67.

[10] M. Bianculli, N. Falcionelli, P. Sernani, S. Tomassini, P. Contardo, M. Lombardi and A.F. Dragoni, A dataset for automatic violence detection in videos, Data Br. 33 (2020), 106587.

[11] M. Chelali, C. Kurtz, A. Puissant and N. Vincent, Classification of spatially enriched pixel time series with convolutional neural networks, 25th Int. Conf. Pattern Recog. (ICPR), 2020, p. 5310–5317.

[12] M. Chelali, C. Kurtz and N. Vincent, Violence detection from video under 2d spatio-temporal representations, IEEE Int. Conf. Image Process. (ICIP), 2021, p. 2593–2597.

[13] H.F. Chen, Support-vector networks CORINNA, Chem. Biol. Drug Des. 74 (1995), no. 2, 142–147.

[14] M. Cheng, K. Cai and M. Li, RWF-2000: An open large scale video database for violence detection, 25th Int. Conf. Pattern Recog. (ICPR), 2020, p. 4183–4190.

[15] Z. Cui, R. Ke, Z. Pu and Y. Wang, Stacked bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction, Transp. Res. Part C Emerg. Technol. 118 (2020), p. 102674.

[16] B. Di Liu, J. Meng, W.Y. Xie, S. Shao, Y. Li and Y. Wang, Weighted spatial pyramid matching collaborative representation for remote-sensing-image scene classification, Remote Sens. 11 (2019), no. 5, 1–18.

[17] D. Dur˜aes, F. Santos, F.S. Marcondes, S. Lange and J. Machado, Comparison of transfer learning behaviour in violence detection with different public datasets, EPIA Conf. Artif. Intell., 2021, p. 290–298.

[18] J.L. Elman, Finding structure in time, Cogn. Sci. A Multidiscip. 14 (1986), no. 2, 179–211.

[19] L. Fei-Fei, J. Deng and K. Li, ImageNet: Constructing a large-scale image database, IEEE Conf. Comput. Vis. pattern Recog., 2009, p. 248–255.

[20] E. Fix and J.L. Hodges, Discriminatory analysis. Nonparametric discrimination: Consistency properties, Consistency Prop. Int. Stat. Rev. Int. Stat. 57 (1989), no. 3, 238–247.

[21] Y. Gao and D. Glowacka, Deep gate recurrent neural network, Workshop Conf. Proc., 2016, p. 350–365.

[22] Y. Gao, H. Liu, X. Sun, C. Wang and Y. Liu, Violence detection using oriented violent flows, Image Vis. Comput. 48 (2016), 37–41.

[23] D.K. Ghosh, A. Chakrabarty, N. Mansoor, D.Y. Suh and J. Piran, Learning-driven spatio-temporal feature extraction for violence detection in IoT environments, Int. Conf. Inf. Commun. Technol. Converg., 2021, p. 1807–1812.

[24] R. Halder and R. Chatterjee, CNN-BiLSTM model for violence detection in smart surveillance, SN Comput. Sci. 1 (2020), no. 4, 1–9.

[25] A. Hanson, K. Pnvr, S. Krishnagopal and L. Davis, Bidirectional convolutional LSTM for the detection of violence in videos, in European Conference on Computer Vision (ECCV) Workshops, 2018, p. 280–295.

[26] A.E.H. Hassan and M.E.E. Ageed, Student violence in universities (manifestation, causes, effects, and solution’s) in Zalingei University-central Darfur State Sudan, ARPN J Sci Technol. 5 (2015), no. 2, 80–86.

[27] T. Hassner, Y. Itcher and O. Kliper-Gross, Violent flows: Real-time detection of violent crowd behavior, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, 2012, p. 1–6.

[28] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, IEEE Conf. Comput. Vis. pattern Recog., 2016, p. 770–778.

[29] T.K. Ho, Random decision forests, 3rd Int. Conf. Doc. Anal. and Recog., 1 (1995), 278–282.

[30] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735–1780.

[31] N. Honarjoo, A. Abdari and A. Mansouri, Violence detection using pre-trained models, 5th Int. Conf. Pattern Recognit. Image Anal. (IPRIA), 2021, p. 1–4.

[32] G. Huang, Z. Liu, L. Van Der Maaten and K.Q. Weinberger, Densely connected convolutional networks, IEEE Conf. Comput. Vis. Pattern Recog., 2017, p. 4700–4708.

[33] Z. Islam, M. Rukonuzzaman, R. Ahmed, M.H. Kabir and M. Farazi, Efficient two-stream network for violence detection using separable convolutional LSTM, Int. Joint Conf. Neural Networks, 2021, p. 1–8.

[34] H.M.B. Jahlan and L.A. Elrefaei, Mobile neural architecture search network and convolutional long short-term memory-based deep features toward detecting violence from video, Arab. J. Sci. Eng. 46 (2021), no. 9, 8549–8563.

[35] A. Jain and D.K. Vishwakarma, State-of-the-art violence detection using ConvNets, IEEE Int. Con. Commun. Signal Process., 2020, p. 813–817.

[36] A. Jain and D.K. Vishwakarma, Deep neural net for violence detection using motion features from dynamic images, Third Int. Conf. Smart Syst. Invent. Technol., 2020, p. 826–831.

[37] C. Janiesch and K. Heinrich, Machine learning and deep learning, Electron. Mark. 31 (2021), 685–695.

[38] M.S. Kang, R.H. Park and H.M. Park, Efficient spatio-temporal modeling methods for real-time violence recognition, IEEE Access 9 (2021), 76270–76285.

[39] A.S. Keceli and A. Kaya, Violent activity detection with transfer learning method, Electron. Lett. 53 (2017), no. 15, 1047–1048.

[40] K.E. Ko and K.B. Sim, Deep convolutional framework for abnormal behavior detection in a smart surveillance system, Eng. Appl. Artif. Intell. 67 (2018), 226–234.

[41] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly and N. Houlsby, Big transfer (BiT): General visual representation learning, Computer Vision–ECCV 2020: 16th European Conf. 16 (2020), 491–507.

[42] Y. Lecun, Y. Bengio and G. Hinton, Deep learning, Nature 521 (2015), no. 7553, 436–444.

[43] Y. Lecun, L. Bottou, Y. Bengio and P. Ha, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), no. 11, 2278–2324.

[44] Q. Liang, Y. Li, B. Chen and K. Yang, Violence behavior recognition of two-cascade temporal shift module with attention mechanism, J. Electron. Imag. 30 (2021), no. 04, 1–13.

[45] Q. Liang, Y. Li, K. Yang, X. Wang and Z. Li, Long-term recurrent convolutional network violent behaviour recognition with attention mechanism, MATEC Web Conf. 336 (2021), p. 05013.

[46] K. Lloyd, P.L. Rosin, D. Marshall and S.C. Moore, Detecting violent and abnormal crowd activity using temporal analysis of grey level co-occurrence matrix ( GLCM ) -based texture measures, Mach. Vis. Appl. 25 (2017), no. 3–4, 361–371.

[47] C. Mencacci, Violence: A global public health problem, Quad. Ital. Psichiatr. 30 (2002), no. 1, 1–2.

[48] D. Moreira, S. Avila, M. Perez, D. Moraes, V. Testoni, E. Valle, S. Goldenstein and A. Rocha, Temporal robust features for violence detection, IEEE Winter Conf. Appl. Comput. Vision (WACV), 2017, p. 391–399.

[49] I. Mugunga, J. Dong, E. Rigall, S. Guo, A.H. Madessa and H.S. Nawaz, A frame-based feature model for violence detection from surveillance cameras using ConvLSTM network, 6th Int. Conf. Image, Vision and Comput. ICIVC, 2021, p. 55–60.

[50] A. Mumtaz, A.B. Sargano and Z. Habib, Violence detection in surveillance videos with deep network using transfer learning, 2nd Eur. Conf. Electr. Eng. Comput. Sci. (EECS), 2018, p. 558–563.

[51] A.J. Naik and M.T. Gopalakrishna, Violence detection in surveillance video-A survey, Int. J. Lat. Res. Engin. Technol. 2017 (2017), 11–17.

[52] E.B. Nievas, O.D.Suarez, G.B. Garc´ıa and R. Sukthankar, Violence detection in video using computer vision techniques, Int. Conf. Comput. Anal. Images and Patterns, 2011, p. 332–339.

[53] N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G.V. Hernandez, L. Krpalkova, D. Riordan and J. Walsh, Deep learning vs. traditional computer vision, Adv. Intell. Syst. Comput. 943 (2020), 128–144.

[54] G. Pang, C. Yan, C. Shen, A. van den Hengel and X. Bai, Self-trained deep ordinal regression for end-to-end video anomaly detection, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2020, p. 12170–12179.

[55] R. Pascanu, T. Mikolov and Y. Bengio, On the difficulty of training recurrent neural networks, Int. Conf. Machine Learn., 2013, p. 1310–1318.

[56] M.B. Patel, Real-time violence detection using CNN-LSTM, arXiv Prepr. arXiv2107.07578, (2021), 1–6.

[57] H. Pham, Z. Dai, Q. Xie, M.-T. Luong and Q.V. Le, Meta pseudo labels, in IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, p. 11557–11568.

[58] M. Ramzan, A. Abid, H.U. Khan, S.M. Awan, A. Ismail, M. Ahmed, M. Ilyas and A. Mahmood, A review on state-of-the-art violence detection techniques, IEEE Access 7 (2019), 107560–107575.

[59] F.J. Rend´on-Segador, J.A. Alvarez-Garc´ıa, F. Enr´ıquez and O. Deniz, ViolenceNet: Dense multi-head self-attention with bidirectional convolutional LSTM for detecting violence, Electron. 10 (2021), no. 13, 1601.

[60] D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning internal representations by error propagation, Calif. Univ San Diego La Jolla Inst Cogn. Sci. 1985 (1985), 399–421.

[61] T. Senst, V. Eiselein, A. Kuhn and T. Sikora, Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation, IEEE Trans. Inf. Foren. Secur. 2017 (2017), 2945–2956.

[62] S.R. Shakya, C. Zhang and Z. Zhou, Comparative study of machine learning and deep learning architecture for human activity recognition using accelerometer data, Int. J. Mach. Learn. Comput. 8 (2018), no. 6, 577–582.

[63] S. Sharma, B. Sudharsan, S. Naraharisetti, V. Trehan and K. Jayavel, A fully integrated violence detection system using CNN and LSTM, Int. J. Electr. Comput. Eng. 11 (2021), no. 4, 3374–3380.

[64] C.S. Shivaraj, Artificial intelligence for human behavior analysis, Int. Res. J. Eng. Technol. 5 (2018), no. 6, 1863–1870.

[65] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 3rd Int. Conf. Learn. Represent. ICLR, Conf. Track Proc., 2015, p. 1–14.

[66] M.M. Soliman, M.H. Kamal, M.A. El-Massih Nashed, Y.M. Mostafa, B.S. Chawky and D. Khattab, Violence recognition from videos using deep learning techniques, IEEE 9th Int. Conf. Intell. Comput. Info. Syst. ICICIS, 2019, p. 80–85.

[67] Y. Su, G. Lin, J. Zhu and Q. Wu, Human interaction learning on 3d skeleton point clouds for video recognition, in European Conf. Comput. Vis. (2020), 74–90.

[68] S. Sudhakaran, O. Lanz and F.B. Kessler, Learning to detect violent videos using convolutional long short-term memory, 14th IEEE Int. Conf. Adv. Video and Signal Based Surveillance (AVSS), 2017, p. 1–6.

[69] T. Surasak, I. Takahiro, C.H. Cheng, C.E. Wang and P.Y. Sheng, Histogram of oriented gradients for human detection in video, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR’05), 2005, p. 886–893.

[70] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, Rethinking the inception architecture for computer vision, IEEE Conf. Comput. Vis. Pattern Recog., 2016, p. 2818–2826.

[71] R. Takahashi, T. Matsubara and K. Uehara, Data augmentation using random image cropping and patching for deep CNNs, IEEE Trans. Circuits Syst. Video Technol. 30 (2020), no. 9, 2917–2931.

[72] A. Traore and M.A. Akhloufi, Violence detection in videos using deep recurrent and convolutional neural networks, 2020 IEEE Int. Conf. Syst. Man. Cyber. (SMC), 2020, p. 154–159.

[73] F.U.M. Ullah, M.S. Obaidat, K. Muhammad, A. Ullah, S.W. Baik, F. Cuzzolin J.J. Rodrigues and V.H.C. de Albuquerque, An intelligent system for complex violence pattern analysis and detection, Int. J. Intell. Syst. 36 (2021), 1–23.

[74] F.U.M. Ullah, A. Ullah, K. Muhammad, I.U. Haq and S.W. Baik, Violence detection using spatiotemporal features with 3D convolutional neural network, Sensors (Switzerland), 19 (2019), no. 11, 1–15.

[75] S. Vento, F. Cainelli and A. Vallone, Violence against healthcare workers: A worldwide phenomenon with serious consequences, Front. Public Heal. 8 (2020), 541.

[76] S. Woo, J. Park, J. Lee and I.S. Kweon, CBAM: Convolutional block attention module, Eur. Conf. Comput. Vis. (ECCV), 2018, p.3–19.

[77] Q. Xie, M.T. Luong, E. Hovy and Q.V. Le, Self-training with noisy student improves imagenet classification, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2020, p. 10687–10698.

[78] T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang and X. He, A new method for violence detection in surveillance scenes, Multimed. Tools Appl. 75 (2016), no. 12, 7327–7349.

International Journal of Nonlinear Analysis and Applications

Volume 13, Issue 2
July 2022
Pages 1117-1130

Files

History

Receive Date: 05 January 2022
Revise Date: 01 March 2022
Accept Date: 14 March 2022

How to cite

Statistics

Article View: 44,582
PDF Download: 1,307

International Journal of Nonlinear Analysis and Applications

A review on video violence detection approaches

Volume 13, Issue 2July 2022Pages 1117-1130

Volume 13, Issue 2
July 2022
Pages 1117-1130