Human recognition by utilizing voice recognition and visual recognition

Altyar, Sukaina Sh; Hussein, Samera Shams; Mohammed, Mahir Jasem

doi:10.22075/ijnaa.2022.5501

Human recognition by utilizing voice recognition and visual recognition

Document Type : Research Paper

Authors

¹ Department of Computer Science, College of Education for Pure Science University of Baghdad, Baghdad, Iraq

² Department of Religious Education, Iraqi Sunni Affairs, Iraq

10.22075/ijnaa.2022.5501

Abstract

Audio-visual detection and recognition system is thought to become the most promising methods for many applications includes surveillance, speech recognition, eavesdropping devices, intelligence operations, etc. In the recent field of human recognition, the majority of the research be- coming performed presently is focused on the reidentification of various body images taken by several cameras or its focuses on recognized audio-only. However, in some cases these traditional methods can- not be useful when used alone such as in indoor surveillance systems, that are installed close to the ceiling and capture images right from above in a downwards direction and in some cases people don't look straight the cameras or it cannot be added in some area such as W.C. or sleeping room. Thus, its commonly difficult to identify any movement or breakthrough process, on the other hand when need to pursue suspect when enter a building or party to identify his location and/or listen to his speech only and isolate it from other voices or noises, the other. Hence, the use of the hybrid combination technique is very effective. In this work, we proposed a multimodal human recognition approach that utilizes both the face and audio and is based upon a deep convolutional neural network (CNN). Mainly, to solve the challenge of not capturing part of the body, final results of recognizing via separate CNNs of VGG Face16 and ResNet50 are joined together depending on the score-level combination by Weighted Sum rule to enhance recognition performance. The results show that the proposed system success to recognise each person from his voice and/or his face captured. In addition, the system can separate the person voice and isolate it from noisy environment and determine the existence of desired person.

Keywords

References

[1] E. M. Grais and M. D. Plumbley, Combining Fully Convolutional and Recurrent Neural Networks for Single Channel Audio Source Separation, In Audio Engineering Society Convention 144. Audio Engineering Society, 2018.

[2] M. H. Kolekar, Intelligent Video Surveillance Systems: An Algorithmic Approach, CRC Press, 2018.

[3] Y. Kortli, M. Jridi, A. Al Falou and M. Atri, Face recognition systems: A Survey, Sensors, 20 (2020) 342.

[4] J. Kotus, K. Lopatka, A. Czyz˙ewski, G. Bogdanis and June, Audio-visual surveillance system for application in bank operating room, Int Conf Multimedia Commun Serv Secur., Springer, Berlin, Heidelberg, 2013, pp. 107-120.

[5] G. O’Regan, Artificial Intelligence and Applications, Springer, 2018.

[6] C. H. Taal, R. C. Hendriks, R. Heusdens and J. A. Jensen, short-time objective intelligibility measure for timefrequency weighted noisy speech, In 2010 IEEE Int. Conf. Acoustics, Speech and Signal processing, IEEE, (2010) pp.4214-4217.

[7] A. Torfi, S. M. Iranmanesh, N. Nasrabadi and J. Dawson, 3d convolutional neural networks for cross audio-visual matching recognition, IEEE Access, 5 (2017) 22081–22091.

International Journal of Nonlinear Analysis and Applications

Volume 13, Issue 1
March 2022
Pages 343-351

Files

History

Receive Date: 01 August 2021
Revise Date: 11 September 2021
Accept Date: 27 September 2021

How to cite

Statistics

Article View: 17,017
PDF Download: 10,094

International Journal of Nonlinear Analysis and Applications

Human recognition by utilizing voice recognition and visual recognition

Volume 13, Issue 1March 2022Pages 343-351

Volume 13, Issue 1
March 2022
Pages 343-351