Separation of Multiple Speech Signals by usingTriangular Microphone Array

Nozomu Hamada

doi:10.37936/ecti-eec.200861.171743

PDF

Published: Jan 30, 2008

DOI: https://doi.org/10.37936/ecti-eec.200861.171743

Keywords:

Separation of speech signals ICA Time-frequency masking Hands-free commmunication Human-machine interfaces

Nozomu Hamada

Department of Systems Engineering, Faculty of Science and engineering, Keio University

Abstract

Speech source separation has been an important topic to realize speech-based human-machine interfaces or high quality hand-free communication with machines. For source separation, Independent Component Analysis (ICA) and time-frequency masking are powerful methods as a tool of Blind Source Separation (BSS) of speech mixtures. The latter method is based on the assumption called \W-Disjoint Orthogonality" which implies the cell component sparsity of speech in the time-frequency domain. One of the topics treated in this article is to introduce the time-frequency masking scheme is applied to the equilateral triangular array where the three delay estimates from each microphone pairs are obtained. In addition, it is used to improve histogram-mapping algorithm by integrate and coordinate transformation of three delay estimates. Some experiments in real environment for separating multiple sources are performed to verify the effectiveness.

How to Cite

Hamada, N. (2008). Separation of Multiple Speech Signals by usingTriangular Microphone Array. ECTI Transactions on Electrical Engineering, Electronics, and Communications, 6(1), 15–21. https://doi.org/10.37936/ecti-eec.200861.171743

Issue

Vol. 6 No. 1 (2008): Regular Issue

Section

Research Article

This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.

- Creative Commons Copyright License

The journal allows readers to download and share all published articles as long as they properly cite such articles; however, they cannot change them or use them commercially. This is classified as CC BY-NC-ND for the creative commons license.

- Retention of Copyright and Publishing Rights

The journal allows the authors of the published articles to hold copyrights and publishing rights without restrictions.

References

[1] I. Marsic, A. Medel, and J. Flanagan, "Natural Communication with Information Systems," Proceedings of the IEEE, Vol.88, No.8, pp.1354-1365, 2000.

[2] Special Issue on Integrated technologies of Robotic Hearing, (Japanese) , Journal of the Society of the Instrument and Control Engineers Japan, Vol.46, No.6, June 2007.

[3] https://www.aist.go.jp/aist e/latest research /2005/20050620/20050620.html

[4] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing, Prentice-Hall, 1983.

[5] S. Haykin, The Cocktail Party Problem, Neural Computation Vo.17, pp.1875-1902,2005.

[6] E.C.Cherry, "Some Experiments on the recognition of speech, with one and two years," Journal of the Acoustical Society of America, Vol.25,pp.975-979, 1953.

[7] A.S.Bregman, "Auditory scene analysis: The perceptual organization of sound." Cambridge, MA; MIT Press

[8] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & sons, 2001.

[9] S. Amari, S.C. Douglas, A. Cichocki, and H.H. Yang, "Multichannel blind deconvolution and equalization using the natural gradient," Proc. IEEE Workshop on Signal Processing Advances in Wireless Communications, pp.101-104, April 1997.

[10] S. Makino, "Blind source separation of convolutive mixtures", Proceedings SPIE, 624.

[11] O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. on signal processing, Vol.52, No.7, pp.1830{1847, 2004.

[12] S. Makino, H. Sawada, R. Mukai, and S. Araki, "Blind source separation of convolutive mixtures of speech in frequency domain," Invited in IEICE Trans. Fundamentals, vol. E88-A, no. 7, pp.
1640-1655, July 2005.

[13] F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, "A combined approach of array processing and independent component analysis for blind separation of acoustic signals," Proc. ICASSP 2001, pp.2729{2732, May 2001.

[14] J. Huang, N. Ohnishi, N. Sugie, "A Biomimetic System for Localization and Separation of Multiple Sound Sources,`" IEEE Trans. on Instrumentation and Measurement, Vol.44, No.3, pp.733-738, June. 1995.

[15] M.S.Pedersen, et al, "Overcomplete blind source separation by combining ICA and binary time frequency masking", IEEE International workshop on Machine Learning for Signal Processing, pp.15-20, 2005.

[16] F. Yang, N. Hamada, "Solution of Underdetermined Speech Separation Problems by Combining ICA and Time-Frequency Masking Methods," IEICE Technical Report, vol.107, no.239, SP2007-57, pp.1-6, Sept. 2007.

[17] Y. Takenouchi and N.Hamada, "Time-frequency masking for BSS problem using equilateral triangular microphone array", Proceedings of 2005 International Symposium on IntelligentSignal Processing and Communication Systems, pp.185-188, Dec. 13-16, 2005

[18] A. Fujita and N.Hamada, "Separation of Moving Sound Sources by Time-Frequency Masking Mwthod", Journal of Signal Processing, Vol.10, No.4, pp.271-274, July 2006.

[19] E. Weinstein, K. Steele, A. Agarwal, J. Glass, " LOUD: A 1020-Node Microphone Array and Acoustic Beamformer," Proc. ICSV, pp.571-578, Cairns, July. 2007.

Article Sidebar

Main Article Content

Abstract

Article Details

References