Audio Feature and Correlation Function-Based Speech Recognition in FM Radio Broadcasting

Main Article Content

Narathep Phruksahiran

Abstract

The analysis and classification of audio signals are becoming increasingly important, especially in the age of communication and dissemination of information through radio broadcasting systems. It is therefore essential that systems and platforms are available to monitor the spread of fake or fraudulent news. A speech feature-based correlation (SFC) algorithm and a speech recognition framework are developed in this study, combining specific speech features and performance correlation to monitor real-time radio broadcasting and recognize specific speech based on human samples. The speech features include the Mel frequency cepstral coefficient, gammatone cepstral coefficient, spectral entropy, and pitch. The results illustrate the advantages and disadvantages of each feature applied to the various speech sound groups. Furthermore, each feature combined with the design of SFC further enhances system performance and increases accuracy.

Article Details

How to Cite
Phruksahiran, N. (2022). Audio Feature and Correlation Function-Based Speech Recognition in FM Radio Broadcasting. ECTI Transactions on Electrical Engineering, Electronics, and Communications, 20(3), 403–413. https://doi.org/10.37936/ecti-eec.2022203.247516
Section
Publish Article

References

F. Alías, J. Socoró, and X. Sevillano, “A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds,” Applied Sciences, vol. 6, no. 5, May 2016, Art. no. 143.

J.-H. Bach, J. Anemüller, and B. Kollmeier, “Robust speech detection in real acoustic backgrounds with perceptually motivated features,” Speech Communication, vol. 53, no. 5, pp. 690–706, May 2011.

L. Li, D. Wang, C. Zhang, and T. F. Zheng, “Improving short utterance speaker recognition by modeling speech unit classes,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 6, pp. 1129–1139, Jun. 2016.

A. Jati and P. Georgiou, “Neural predictive coding using convolutional neural networks toward unsupervised learning of speaker characteristics,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1577–1589, Oct. 2019.

L. Moro-Velaquez, E. Hernandez-Garcia, J. A. Gomez-Garcia, J. I. Godino-Llorente, and N. Dehak, “Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 798–812, 2020.

S. Ghaffarzadegan, H. Boril, and J. H. L. Hansen, “Generative modeling of pseudo-whisper for robust whispered speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 10, pp. 1705–1720, Oct. 2016.

J. Ming and D. Crookes, “Speech enhancement based on full-sentence correlation and clean speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 531–543, Mar. 2017.

Y.-H. Tu, J. Du, and C.-H. Lee, “Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2080–2091, Dec. 2019.

R. V. Sharan and T. J. Moir, “An overview of applications and advancements in automatic sound recognition,” Neurocomputing, vol. 200, pp. 22–34, Aug. 2016.

Z. Fu, G. Lu, K. M. Ting, and D. Zhang, “A survey of audio-based music classification and annotation,” IEEE Transactions on Multimedia, vol. 13, no. 2, pp. 303–319, Apr. 2011.

A. Pramanik and R. Raha, “Automatic speech recognition using correlation analysis,” in 2012 World Congress on Information and Communication Technologies, 2012, pp. 670–674.

M. E. Rahaman, S. M. S. Alam, H. S. Mondal, A. S. Muntaseer, R. Mandal, and M. Raihan, “Performance analysis of isolated speech recognition technique using MFCC and cross-correlation,” in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019.

R. Gupte, S. Hawa, and R. Sonkusare, “Speech Recognition Using Cross Correlation and Feature Analysis Using Mel-Frequency Cepstral Coefficients and Pitch,” in 2020 IEEE International Conference for Innovation in Technology (INOCON), 2020.

Y. Zhou, Y. Yang, H. Liu, X. Liu, and N. Savage, “Deep learning based fusion approach for hate speech detection,” IEEE Access, vol. 8, pp. 128923–128929, 2020.

P. K. Roy, A. K. Tripathy, T. K. Das, and X.-Z. Gao, “A framework for hate speech detection using deep convolutional neural network,” IEEE Access, vol. 8, pp. 204 951–204 962, 2020.

O. Oriola and E. Kotze, “Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets,” IEEE Access, vol. 8, pp. 21496–21509, 2020.

T. Juhana and S. Girianto, “An SDR-based multistation FM broadcasting monitoring system,” in 2017 11th International Conference on Telecommunication Systems Services and Applications (TSSA), 2017.

M. A. B. Sahbudin, C. Chaouch, M. Scarpa, and S. Serrano, “IoT based song recognition for FM radio station broadcasting,” in 2019 7th International Conference on Information and Communication Technology (ICoICT), 2019.

Y. Q. Lu, Q. N. Lu, D. Z. Chen, J. J. Yang, L. Zhang, and M. Huang, “FM broadcasting monitoring method based on time series analysis,” in Proceedings of the XXXIIIrd URSI General Assembly and Scientific Symposium (URSI GASS 2020), 2020.

S. Potisuk, M. Harper, and J. Gandour, “Classification of Thai tone sequences in syllable-segmented speech using the analysis-by-synthesis method,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 1, pp. 95–102, 1999.

N. Satravaha, P. Klinkhachorn, and N. Lass, “Tone classification of syllable-segmented Thai speech based on multilayer perceptron,” in Proceedings of the 35th Southeastern Symposium on System Theory, 2003, pp. 392–396.

S. Potisuk, “A system for extracting F0 contours of lexical tones using adaptive IIR notch filter with harmonic suppression,” in 2016 International Conference on Asian Language Processing (IALP), 2016, pp. 116–119.

S. Haykin, Communication Systems, 3rd ed. New York, USA: John Wiley & Sons, 1994.

X. Valero and F. Alias, “Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification,” IEEE Transactions on Multimedia, vol. 14, no. 6, pp. 1684–1689, Dec. 2012.

J.-M. Liu et al., “Cough signal recognition with Gammatone Cepstral Coefficients,” in 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013, pp. 160–164.

S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, Aug. 1980.

W. Han, C.-F. Chan, C.-S. Choy, and K.-P. Pun, “An efficient MFCC extraction method in speech recognition,” in 2006 IEEE International Symposium on Circuits and Systems (ISCAS), 2006.

N.-V. Vu, J. Whittington, H. Ye, and J. Devlin, “Implementation of the MFCC front-end for low-cost speech recognition systems,” in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 2334–2337.

H. Misra, S. Ikbal, H. Bourlard, and H. Hermansky, “Spectral entropy based feature for robust ASR,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.

J.-F. Bercher and C. Vignat, “Estimating the entropy of a signal with applications,” IEEE Transactions on Signal Processing, vol. 48, no. 6, pp. 1687–1694, Jun. 2000.

L. Hui, B.-Q. Dai, and L. Wei, “A pitch detection algorithm based on AMDF and ACF,” in 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.

J. W. Zhu, S. F. Sun, X. L. Liu, and B. J. Lei, “Pitch in speaker recognition,” in 2009 Ninth International Conference on Hybrid Intelligent Systems, 2009, pp. 33–36.

S. Gonzalez and M. Brookes, “A Pitch Estimation Filter robust to high levels of noise (PEFAC),” in 2011 19th European Signal Processing Conference, 2011, pp. 451–455.

A. Ludloff, Praxiswissen Radar und Radarsignalverarbeitung, 3rd ed. Wiesbaden, Germany: Vieweg, 2002.