A review on video violence detection approaches

Document Type : Research Paper


Department of Computer Science, College of Science, University of Diyala, Baqubah, Iraq


A violent behaviour detection system (VBDS) is an important application of intelligent video surveillance that performs a critical role in the field of public security and safety VBDS is a sort of behaviour recognition that seeks to determine whether the behaviours observed in the situation are violent, such as fighting or assault. This paper presents a survey of the existing approaches to VBDS. In this paper, the existing VBDS techniques are classified based on their framework, which includes the old-fashion framework and the end-to-end state-of-the-art deep learning framework. Finally, the VBDS methods' performance is assessed and compared.


[1] A.R. Abdali, Data efficient video transformer for violence detection, IEEE Int. Conf. Commun. Networks Satell.,
2021, p. 195–199.
[2] A.M.R. Abdali and R.F. Al-Tuma, Robust real-time violence detection in video using CNN snd LSTM, 2nd Sci.
Conf. Comput. Sci. (SCCS), 2019, p. 104–108.
[3] S. Akti, G.A. Tataroglu and H.K. Ekenel, Vision-based fight detection from surveillance cameras, 9th Int. Conf.
Image Process. Theory, Tools Appl., 2019, p. 1–6.
[4] N. Aldahoul, H.A. Karim, R. Datta, S. Gupta, K. Agrawal and A. Albunni, Convolutional neural network-longshort term memory based IOT node for violence detection, IEEE Int. Conf. Artif. Intell. Eng. Tech. (IICAIET),
2021, p. 1–6.
[5] L. Alzubaidi, J. Zhang, A.J. Humaidi and A. Al-Dujaili, Review of deep learning: Concepts, CNN architectures,
challenges, applications, future directions, J. Big Data 8 (2021), no. 1.
[6] S.M.R. Ammar, M. Anjum, T. Rounak, M. Islam and T. Islam, Using deep learning algorithms to detect violent
activities, Doctoral dissertation, BRAC University, 2019.
[7] R. Barmaki, A decision-theoretic generalization of on-line learning and an application to boosting∗
, J. Comput.
Syst. Sci. 55 (1996), no. 1, 119–139.
[8] A. Benali Amjoud and M. Amrouch, Convolutional neural networks backbones for object detection, Int. Conf.
Image Signal Process., 2020, p. 282–289.
[9] A. Ben Mabrouk and E. Zagrouba, Spatio-temporal feature using optical flow based distribution for violence
detection, Pattern Recog. Lett. 92 (2017), 62–67.
[10] M. Bianculli, N. Falcionelli, P. Sernani, S. Tomassini, P. Contardo, M. Lombardi and A.F. Dragoni, A dataset for
automatic violence detection in videos, Data Br. 33 (2020), 106587.
[11] M. Chelali, C. Kurtz, A. Puissant and N. Vincent, Classification of spatially enriched pixel time series with
convolutional neural networks, 25th Int. Conf. Pattern Recog. (ICPR), 2020, p. 5310–5317.
[12] M. Chelali, C. Kurtz and N. Vincent, Violence detection from video under 2d spatio-temporal representations,
IEEE Int. Conf. Image Process. (ICIP), 2021, p. 2593–2597.
[13] H.F. Chen, Support-vector networks CORINNA, Chem. Biol. Drug Des. 74 (1995), no. 2, 142–147.
[14] M. Cheng, K. Cai and M. Li, RWF-2000: An open large scale video database for violence detection, 25th Int.
Conf. Pattern Recog. (ICPR), 2020, p. 4183–4190.
[15] Z. Cui, R. Ke, Z. Pu and Y. Wang, Stacked bidirectional and unidirectional lstm recurrent neural network for
network-wide traffic speed prediction, Transp. Res. Part C Emerg. Technol. 118 (2020), p. 102674.
[16] B. Di Liu, J. Meng, W.Y. Xie, S. Shao, Y. Li and Y. Wang, Weighted spatial pyramid matching collaborative
representation for remote-sensing-image scene classification, Remote Sens. 11 (2019), no. 5, 1–18.
[17] D. Dur˜aes, F. Santos, F.S. Marcondes, S. Lange and J. Machado, Comparison of transfer learning behaviour in
violence detection with different public datasets, EPIA Conf. Artif. Intell., 2021, p. 290–298.
[18] J.L. Elman, Finding structure in time, Cogn. Sci. A Multidiscip. 14 (1986), no. 2, 179–211.
[19] L. Fei-Fei, J. Deng and K. Li, ImageNet: Constructing a large-scale image database, IEEE Conf. Comput. Vis.
pattern Recog., 2009, p. 248–255.
[20] E. Fix and J.L. Hodges, Discriminatory analysis. Nonparametric discrimination: Consistency properties, Consistency Prop. Int. Stat. Rev. Int. Stat. 57 (1989), no. 3, 238–247.[21] Y. Gao and D. Glowacka, Deep gate recurrent neural network, Workshop Conf. Proc., 2016, p. 350–365.
[22] Y. Gao, H. Liu, X. Sun, C. Wang and Y. Liu, Violence detection using oriented violent flows, Image Vis. Comput.
48 (2016), 37–41.
[23] D.K. Ghosh, A. Chakrabarty, N. Mansoor, D.Y. Suh and J. Piran, Learning-driven spatio-temporal feature extraction for violence detection in IoT environments, Int. Conf. Inf. Commun. Technol. Converg., 2021, p. 1807–1812.
[24] R. Halder and R. Chatterjee, CNN-BiLSTM model for violence detection in smart surveillance, SN Comput. Sci.
1 (2020), no. 4, 1–9.
[25] A. Hanson, K. Pnvr, S. Krishnagopal and L. Davis, Bidirectional convolutional LSTM for the detection of violence
in videos, in European Conference on Computer Vision (ECCV) Workshops, 2018, p. 280–295.
[26] A.E.H. Hassan and M.E.E. Ageed, Student violence in universities (manifestation, causes, effects, and solution’s)
in Zalingei University-central Darfur State Sudan, ARPN J Sci Technol. 5 (2015), no. 2, 80–86.
[27] T. Hassner, Y. Itcher and O. Kliper-Gross, Violent flows: Real-time detection of violent crowd behavior, IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, 2012, p. 1–6.
[28] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, IEEE Conf. Comput. Vis.
pattern Recog., 2016, p. 770–778.
[29] T.K. Ho, Random decision forests, 3rd Int. Conf. Doc. Anal. and Recog., 1 (1995), 278–282.
[30] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735–1780.
[31] N. Honarjoo, A. Abdari and A. Mansouri, Violence detection using pre-trained models, 5th Int. Conf. Pattern
Recognit. Image Anal. (IPRIA), 2021, p. 1–4.
[32] G. Huang, Z. Liu, L. Van Der Maaten and K.Q. Weinberger, Densely connected convolutional networks, IEEE
Conf. Comput. Vis. Pattern Recog., 2017, p. 4700–4708.
[33] Z. Islam, M. Rukonuzzaman, R. Ahmed, M.H. Kabir and M. Farazi, Efficient two-stream network for violence
detection using separable convolutional LSTM, Int. Joint Conf. Neural Networks, 2021, p. 1–8.
[34] H.M.B. Jahlan and L.A. Elrefaei, Mobile neural architecture search network and convolutional long short-term
memory-based deep features toward detecting violence from video, Arab. J. Sci. Eng. 46 (2021), no. 9, 8549–8563.
[35] A. Jain and D.K. Vishwakarma, State-of-the-arts violence detection using ConvNets, IEEE Int. Con. Commun.
Signal Process., 2020, p. 813–817.
[36] A. Jain and D.K. Vishwakarma, Deep neuralNet for violence detection using motion features from dynamic images,
Third Int. Conf. Smart Syst. Invent. Technol., 2020, p. 826–831.
[37] C. Janiesch and K. Heinrich, Machine learning and deep learning, Electron. Mark. 31 (2021), 685–695.
[38] M.S. Kang, R.H. Park and H.M. Park, Efficient spatio-temporal modeling methods for real-time violence recognition, IEEE Access 9 (2021), 76270–76285.
[39] A.S. Ke¸celi and A. Kaya, Violent activity detection with transfer learning method, Electron. Lett. 53 (2017), no.
15, 1047–1048.
[40] K.E. Ko and K.B. Sim, Deep convolutional framework for abnormal behavior detection in a smart surveillance
system, Eng. Appl. Artif. Intell. 67 (2018), 226–234.
[41] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly and N. Houlsby, Big transfer (BiT): General
visual representation learning, Computer Vision–ECCV 2020: 16th European Conf. 16 (2020), 491–507.
[42] Y. Lecun, Y. Bengio and G. Hinton, Deep learning, Nature 521 (2015), no. 7553, 436–444.
[43] Y. Lecun, L. Bottou, Y. Bengio and P. Ha, Gradient-based learning applied to document recognition, Proc. IEEE,
86 (1998), no. 11, 2278–2324.
[44] Q. Liang, Y. Li, B. Chen and K. Yang, Violence behavior recognition of two-cascade temporal shift module with
attention mechanism, J. Electron. Imag. 30 (2021), no. 04, 1–13.
[45] Q. Liang, Y. Li, K. Yang, X. Wang and Z. Li, Long-term recurrent convolutional network violent behaviourrecognition with attention mechanism, MATEC Web Conf. 336 (2021), p. 05013.
[46] K. Lloyd, P.L. Rosin, D. Marshall and S.C. Moore, Detecting violent and abnormal crowd activity using temporal
analysis of grey level co-occurrence matrix ( GLCM ) -based texture measures, Mach. Vis. Appl. 25 (2017), no.
3–4, 361–371.
[47] C. Mencacci, Violence: A global public health problem, Quad. Ital. Psichiatr. 30 (2002), no. 1, 1–2.
[48] D. Moreira, S. Avila, M. Perez, D. Moraes, V. Testoni, E. Valle, S. Goldenstein and A. Rocha, Temporal robust
features for violence detection, IEEE Winter Conf. Appl. Comput. Vision (WACV), 2017, p. 391–399.
[49] I. Mugunga, J. Dong, E. Rigall, S. Guo, A.H. Madessa and H.S. Nawaz, A frame-based feature model for violence
detection from surveillance cameras using ConvLSTM network, 6th Int. Conf. Image, Vision and Comput. ICIVC,
2021, p. 55–60.
[50] A. Mumtaz, A.B. Sargano and Z. Habib, Violence detection in surveillance videos with deep network using transfer
learning, 2nd Eur. Conf. Electr. Eng. Comput. Sci. (EECS), 2018, p. 558–563.
[51] A.J. Naik and M.T. Gopalakrishna, Violence detection in surveillance video-A survey, Int. J. Lat. Res. Engin.
Technol. 2017 (2017), 11–17.
[52] E.B. Nievas, O.D.Suarez, G.B. Garc´ıa and R. Sukthankar, Violence detection in video using computer vision
techniques, Int. Conf. Comput. Anal. Images and Patterns, 2011, p. 332–339.
[53] N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G.V. Hernandez, L. Krpalkova, D. Riordan and J.
Walsh, Deep learning vs. traditional computer vision, Adv. Intell. Syst. Comput. 943 (2020), 128–144.
[54] G. Pang, C. Yan, C. Shen, A. van den Hengel and X. Bai, Self-trained deep ordinal regression for end-to-end video
anomaly detection, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2020, p. 12170–12179.
[55] R. Pascanu, T. Mikolov and Y. Bengio, On the difficulty of training recurrent neural networks, Int. Conf. Machine
Learn., 2013, p. 1310–1318.
[56] M.B. Patel, Real-time violence detection using CNN-LSTM, arXiv Prepr. arXiv2107.07578, (2021), 1–6.
[57] H. Pham, Z. Dai, Q. Xie, M.-T. Luong and Q.V. Le, Meta pseudo labels, in IEEE/CVF Conf. Comput. Vis.
Pattern Recog., 2021, p. 11557–11568.
[58] M. Ramzan, A. Abid, H.U. Khan, S.M. Awan, A. Ismail, M. Ahmed, M. Ilyas and A. Mahmood, A review on
state-of-the-art violence detection techniques, IEEE Access 7 (2019), 107560–107575.
[59] F.J. Rend´on-Segador, J.A. Alvarez-Garc´ıa, F. Enr´ıquez and O. Deniz, ´ ViolenceNet: Dense multi-head selfattention with bidirectional convolutional LSTM for detecting violence, Electron. 10 (2021), no. 13, 1601.
[60] D.E. Rumelhart, G.E. Hinton and R.J. Williams, Learning internal representations by error propagation, Calif.
Univ San Diego La Jolla Inst Cogn. Sci. 1985 (1985), 399–421.
[61] T. Senst, V. Eiselein, A. Kuhn and T. Sikora, Crowd violence detection using global motion-compensated lagrangian
features and scale-sensitive video-level representation, IEEE Trans. Inf. Foren. Secur. 2017 (2017), 2945–2956.
[62] S.R. Shakya, C. Zhang and Z. Zhou, Comparative study of machine learning and deep learning architecture for
human activity recognition using accelerometer data, Int. J. Mach. Learn. Comput. 8 (2018), no. 6, 577–582.
[63] S. Sharma, B. Sudharsan, S. Naraharisetti, V. Trehan and K. Jayavel, A fully integrated violence detection system
using CNN and LSTM, Int. J. Electr. Comput. Eng. 11 (2021), no. 4, 3374–3380.
[64] C.S. Shivaraj, Artificial intelligence for human behavior analysis, Int. Res. J. Eng. Technol. 5 (2018), no. 6,
[65] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 3rd Int. Conf.
Learn. Represent. ICLR, Conf. Track Proc., 2015, p. 1–14.
[66] M.M. Soliman, M.H. Kamal, M.A. El-Massih Nashed, Y.M. Mostafa, B.S. Chawky and D. Khattab, Violence
recognition from videos using deep learning techniques, IEEE 9th Int. Conf. Intell. Comput. Info. Syst. ICICIS,
2019, p. 80–85.
[67] Y. Su, G. Lin, J. Zhu and Q. Wu, Human interaction learning on 3d skeleton point clouds for video violencerecognition, in European Conf. Comput. Vis. (2020), 74–90.
[68] S. Sudhakaran, O. Lanz and F.B. Kessler, Learning to detect violent videos using convolutional long short-term
memory, 14th IEEE Int. Conf. Adv. Video and Signal Based Surveillance (AVSS), 2017, p. 1–6.
[69] T. Surasak, I. Takahiro, C.H. Cheng, C.E. Wang and P.Y. Sheng, Histogram of oriented gradients for human
detection in video, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR’05), 2005, p. 886–893.
[70] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, Rethinking the inception architecture for computer
vision, IEEE Conf. Comput. Vis. Pattern Recog., 2016, p. 2818–2826.
[71] R. Takahashi, T. Matsubara and K. Uehara, Data augmentation using random image cropping and patching for
deep CNNs, IEEE Trans. Circuits Syst. Video Technol. 30 (2020), no. 9, 2917–2931.
[72] A. Traore and M.A. Akhloufi, Violence detection in videos using deep recurrent and convolutional neural networks,
2020 IEEE Int. Conf. Syst. Man. Cyber. (SMC), 2020, p. 154–159.
[73] F.U.M. Ullah, M.S. Obaidat, K. Muhammad, A. Ullah, S.W. Baik, F. Cuzzolin J.J. Rodrigues and V.H.C. de
Albuquerque, An intelligent system for complex violence pattern analysis and detection, Int. J. Intell. Syst. 36
(2021), 1–23.
[74] F.U.M. Ullah, A. Ullah, K. Muhammad, I.U. Haq and S.W. Baik, Violence detection using spatiotemporal features
with 3D convolutional neural network, Sensors (Switzerland), 19 (2019), no. 11, 1–15.
[75] S. Vento, F. Cainelli and A. Vallone, Violence against healthcare workers: A worldwide phenomenon with serious
consequences, Front. Public Heal. 8 (2020), 541.
[76] S. Woo, J. Park, J. Lee and I.S. Kweon, CBAM: Convolutional block attention module, Eur. Conf. Comput. Vis.
(ECCV), 2018, p.3–19.
[77] Q. Xie, M.T. Luong, E. Hovy and Q.V. Le, Self-training with noisy student improves imagenet classification,
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2020, p. 10687–10698.
[78] T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang and X. He, A new method for violence detection in surveillance
scenes, Multimed. Tools Appl. 75 (2016), no. 12, 7327–7349.
Volume 13, Issue 2
July 2022
Pages 1117-1130
  • Receive Date: 05 January 2022
  • Revise Date: 01 March 2022
  • Accept Date: 14 March 2022