MFCC Analysis of Biometric Person Identification System in Multilingual Speech Environment

Tushar Kant Sahu, Shri Shankracharya College of Engineering and Technology bhilai; Vinay Jain ,Shri Shankracharya College of Engineering and Technology bhilai

Mel Frequency Cepstrum Coefficient(MFCC), Support Vector Machine(SVM)

In this paper we explain the multilingual speaker identification system. Speaker identification is conducted on 3 Indian languages (Hindi, Marathi and Rajasthani). We create a database of 25 person in each language. In our system we use 3 different sentences and each sentence in 3 language. We focus on the effect of language mismatch in the speaker identification performance of individual languages and all languages together. Mel Frequency Cepstrum Coefficient (MFCC) is used for feature extraction. The standard SVM-based speaker identification is used.
    [1] Sourjya Sarkar, K. Sreenivasa Rao, Dipanjan Nandi and Sunil Kumar,"Multilingual Speaker Recognition on Indian Languages", Annual IEEE India Conference (INDICON) 2013. [2] Javier Gonzalez-Dominguez, Member, IEEE, David Eustis, Ignacio Lopez-Moreno, Member, IEEE, Andrew Senior, Senior Member, IEEE, Françoise Beaufays, Senior Member, IEEE, and Pedro J. Moreno, Senior Member, IEEE ,"A Real-Time End-to-End Multilingual Speech Recognition Architecture", IEEE Journal of Selected Topics In Signal Processing, VOL. 9, NO. 4, JUNE 2015. [3] Michal ADAMSKI, Prof. Basie VON SOLMS,"An Open Speaker Recognition Enabled Identification and Authentication System",IST-Africa 2014 Conference Proceedings Paul Cunningham and Miriam Cunningham (Eds) IIMC International Information Management Corporation, 2014. [4] Shigeki Matsuda, Xinhui Hu, Yoshinori Shiga, Hideki Kashioka, Chiori Hori, Keiji Yasuda, Hideo Okuma, Masao Uchiyama, Eiichiro Sumita, Hisashi Kawai, and Satoshi Nakamura,"Multilingual speech-to-speech translation system:VoiceTra",IEEE 14th International Conference on Mobile Data Management 2013. [5] Hui Lin∗†, Jui-ting Huang∗‡, Franc¸oise Beaufays∗, Brian Strope∗, Yun-hsuan Sung," Recognition of Multilingual Speech in Mobile Applications", 978-1-4673-0046-9/12/$26.00 ©2012 IEEE. [6] U. Bhattacharjee and A. Sarmah, “A multilingual speech database for speaker recognition,” in Proc. IEEE International Conference on Signal Processing, Computing and Control (ISPCC), 2012, pp. 1–5. [7] K. S. Rao, S. Maity, and V. R. Reddy, “Pitch synchronous and glottal closure based speech analysis for language recognition,” International Journal of Speech Technology, vol. Springer (Accepted, DOI: 10.1007/s10772-013-9193-5), 2013. [8] S. Maity, A. Vuppala, K. S. Rao, and D. Nandi, “IITKGP-MLILSC Speech Database for Language Identification,” in Proc. IEEE 18th National Conference on Communications, 2012, pp. 1–5. [9] V. R. Reddy, S. Maity, and K. S. Rao, “Identification of Indian languages using multi-level spectral and prosodic features,” International Journal of Speech Technology (Springer), vol. DOI: 10.1007/s10772-013-9198- 0, 2013 [10] K. S. Rao, S. Maity, and V. R. Reddy, “Pitch synchronous and glottal closure based speech analysis for language recognition,” International Journal of Speech Technology (Springer), vol. DOI: 10.1007/s10772- 013-9193-5, 2013. [11] B. Nagaraja and H. Jayanna, “Multilingual Speaker Identification with the Constraint of Limited Data Using Multitaper MFCC,” pp. 127–134, 2012. [12] H. Caesar, “Integrating language identification to improve multilingual speech recognition,” Idiap, Idiap-RR Idiap-RR-24–2012, no. 7, 2012. [13] H. Lin, J. T. Huang, F. Beaufays, B. Strope, and Y. H. Sung, “Recognition of multilingual speech in mobile applications,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2012, pp. 4881–4884. [14] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, “Large scale distributed deep networks,” in Advances in Neural Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds. Cambridge, MA, USA: MIT Press, 2012, pp. 1232–1240. [15] G. Heigold, V. Vanhoucke, A. W. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean, “Multilingual acoustic models using distributed deep neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 8619–8623. [16] I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Plchot, D. Martinez, J. Gonzalez-Rodriguez, and P. Moreno, “Automatic language identification using deep neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 5337–5341. [17] A. Mohamed, G. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14–22, Jan. 2012. [18] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012. [19] D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber, “Deep big simple neural nets excel on handwritten digit recognition,” CoRR, vol. Abs/1003.0358, 2010. [20] D. Yu and L. Deng, “Deep learning and its applications to signal and information processing, Exploratory DSP,” IEEE Signal Processing Mag., vol. 28, no. 1, pp. 145–154, Jan. 2011.
Paper ID: GRDJEV01I120068
Published in: Volume : 1, Issue : 12
Publication Date: 2016-12-01
Page(s): 114 - 117