Exploiting Magnitude and Phase Aware Deep Neural Network for Replay Attack Detection

Khomdet Phapatanaburi; Prawit Buayai; Watcharaphon Naktong; Jakkree Srinonchat

doi:10.37936/ecti-eec.2020182.240341

PDF

Published: Aug 16, 2020

DOI: https://doi.org/10.37936/ecti-eec.2020182.240341

Keywords:

Phase Information Magnitude and Phase Aware Deep Neural Network Replay Attack Detection ASVspoof 2017

Khomdet Phapatanaburi

Prawit Buayai

University of Yamanashi

Watcharaphon Naktong

Rajamangala University of Technology Isan Nakhonrachasrima

Jakkree Srinonchat

Rajamangala University of Technology Thanyaburi

Abstract

Magnitude and phase aware deep neural network (MP aware DNN) based on Fast Fourier Transform information, has recently been received more attention to many speech applications. However, little attention has been paid to its aspect in terms of replay attack detection developed for the automatic speaker verification and countermeasures (ASVspoof 2017). This paper aims to investigate the MP aware DNN as a speech classification for detecting non-replayed (genuine) and replayed speech. Also, to exploit the advantage of the classifier-based complementary to improve the reliable detection decision, we propose a novel method by combining MP aware DNN with standard replay attack detection (that is, the use of constant Q transform cepstral coefficients-based Gaussian mixture model classification: CQCC-based GMM). Experiments are evaluated using ASVspoof 2017 and a standard measure of detection performance called equal error rate (EER). The results showed that MP aware DNN -based detection performed conventional DNN method using only the magnitude/phase features. Moreover, we found that score combination of CQCC-based GMM with MP aware DNN achieved additional improvement, indicating that MP aware DNN is very useful, especially when combined with the CQCC-based GMM for replay attack detection.

How to Cite

Phapatanaburi, K., Buayai, P., Naktong, W., & Srinonchat, J. (2020). Exploiting Magnitude and Phase Aware Deep Neural Network for Replay Attack Detection. ECTI Transactions on Electrical Engineering, Electronics, and Communications, 18(2), 89–97. https://doi.org/10.37936/ecti-eec.2020182.240341

Issue

Vol. 18 No. 2 (2020): Regular Issue (August 2020)

Section

Publish Article

This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.

- Creative Commons Copyright License

The journal allows readers to download and share all published articles as long as they properly cite such articles; however, they cannot change them or use them commercially. This is classified as CC BY-NC-ND for the creative commons license.

- Retention of Copyright and Publishing Rights

The journal allows the authors of the published articles to hold copyrights and publishing rights without restrictions.

Author Biographies

Prawit Buayai, University of Yamanashi

Department of Computer Science and Engineering, University of Yamanashi, Kofu, Japan

Watcharaphon Naktong, Rajamangala University of Technology Isan Nakhonrachasrima

Department of Telecommunication Engineering, Faculty of Engineering and Architecture, Rajamangala University of Technology Isan Nakhonrachasrima, Thailand

Jakkree Srinonchat, Rajamangala University of Technology Thanyaburi

Department of Electronics and Telecommunication Engineering, Faculty of Engineering, Rajamangala University of Technology Thanyaburi, Thailand

References

[1] D. A. Reynolds and R. C. Abou Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE transactions on speech and audio processing, Vol. 3, No.1, pp.72-83, 1995.

[2] J. P. Campbell, “Speaker recognition: A tutorial,” Proceedings of the IEEE, Vol. 85, No.9, pp.1437-1462, 1997.

[3] T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: From features to supervectors,” Speech communication, Vol. 52, No.1, pp.12-40, 2010.

[4] J. H. Hansen and T. Hasan, “Speaker recognition by machines and humans: A tutorial review,” IEEE Signal processing magazine, Vol. 79, No.2, pp.74-99, 2015.

[5] J. Villalba and E. Lleida, “Speaker verification performance degradation against spoofing and tampering attacks,” Proceeding of FALA workshop, pp.131-134, 2010.

[6] J. Villalba and E. Lleida, “Detecting replay attacks from far-field recordings on speaker verification systems,” Proceeding of 5th European Workshop on Biometrics and Identity Management, pp.274-285, 2011.

[7] F. Alegre, A. Janicki and N. Evans, “Re-assessing the threat of replay spoofing attacks against automatic speaker verification,” Proceeding of Biometrics Special Interest Group (BIOSIG), pp.1-6, 2014.

[8] Z. Wu and H. Li, “On the study of replay and voice conversion attacks to text-dependent speaker verification,” Multimedia Tools and Applications, Vol. 75, No.9, pp. 5311-5327, 2016.

[9] J. Villalba and E. Lleida, “Preventing replay attacks on speaker verification systems,” Proceeding of IEEE International Carnahan Conference on ecurity Technology (ICCST), pp.1-8, 2011.

[10] Z. Wu, S. Gao, E. S. Cling, and H. Li, “A study on replay attack and anti-spoofing for text-dependent speaker verification,” Proceeding of Asia-Pacific Information Processing Association Annual Summit and Conference, pp.1-5, 2014.

[11] M. Sahidullah, T. Kinnunen and C. Hanilçi, “A Comparison of Features for Synthetic Speech Detection,” Proceeding of The International Speech Communication Association, 2015.

[12] X. Xiao, X. Tian, S. Du, H. Xu, Chng E S, Li H. and H. Li, “Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 Challenge,” Proceeding of The International Speech Communication Association, 2015.

[13] Z. Oo, Y Kawakami, L. Wang, S. Nakagawa, X. Xiao and M. Iwahashi,“DNN-based amplitude and phase feature enhancement for noise robust speaker identification,” Proceeding of The International Speech Communication Association , pp.2204-2208, 2016.

[14] D. S. Williamson, Y. Wang and D. Wang, “Complex ratio masking for joint enhancement of magnitude and phase,” Proceeding of 5th IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5220-5224, 2016.

[15] D. S. Williamson, Y. Wang and D. Wang,, “Complex Ratio Masking for Monaural Speech Separation,” IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 24, No.3, pp.483-492, 2016.

[16] L. Wang, K. Phapatanaburi, Z. Oo, S. Nakagawa, M. Iwahashi and J. Dang, “Phase aware deep neural network for noise robust voice activity detection.,” Proceeding of 5th IEEE International Conference on Multimedia and Expo, pp.1087-1092, 2016.

[17] D. Endo, T. Hiyane, K. Atsuta and S. Kondo, “new feature for automatic speaker verification anti-spoofing: Constant Q Cepstral Coefficients.,” Proceeding of Speaker Odyssey Workshop, pp. 249-252, 2016.

[18] K. Phapatanaburi, L. Wang, R. Sakagami, Z. Zhang, X. Li and M. Iwahashi, “Distant-talking accent recognition by combining GMM and DNN,” Multimedia Tools and Applications, Vol. 75, No.9, pp.5109-5124, 2016.

[19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, pp. 504-507, 2006.

[20] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, Vol. 79, No.2, pp.853-859, 1995.

[21] F. Richardson, D. Reynolds and N. Dehak, “Deep neural networks for acoustic modeling in speech recognition IEEE Signal Processing Letters, pp. 671-1675, 2015.

[22] M. L. Seltzer, D. Yu and Y. Wang, “An investigation of deep neural networks for noise robust speech recognition,” Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7398-7402, 2013.

[23] P. Mowlaee, R. Saeidi and Y. Stylianou, “Advances in phase-aware signal processing in speech communication,” International Journal of Electronics, Vol. 81, pp.1-19, 2016.

[24] R. M. Hegde, H. A. Murthy, and V. R. R. Gadde, “Significance of the modified group delay feature in speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No.1, pp. 190-202, 2007.

[25] T. Kinnunen, M. Sahidullah, M. Falcone, L. Costantini, R. G. Hautamäki, D. Thomsen and N. Evans, “RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research,” Proceeding of International Conference on Acoustics, Speech and Signal Processing, pp. 5395–5399, 2017.

[26] X. L. Zhang and J. Wu, “Deep belief networks based voice activity detection,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 21, No.4, pp.697-710, 2013.

[27] Y. Zhuang, S. Tong, M. Yin, Y. Qian and K. Yu, “Multi-task joint-learning for robust voice activity detection,” Proceeding of International Symposium on Chinese Spoken Language Processing, pp.1-5, 2017.

[28] M. J. Alam, P. Kenny, P. Ouellet, and D. O’Shaughnessy, “Multi-taper MFCC features for speaker verification using I-vectors.,” Proceeding of 5th IEEE Workshop on Automatic Speech Recognition, pp.547-552, 2011

Article Sidebar

Main Article Content

Abstract

Article Details

Prawit Buayai, University of Yamanashi

Watcharaphon Naktong, Rajamangala University of Technology Isan Nakhonrachasrima

Jakkree Srinonchat, Rajamangala University of Technology Thanyaburi

References