View All Issues
Benchmark for Speaker Identification using Mel Frequency Cepstral Coefficients on Vowels Following the Nasal Continuants in Kannada | Journal of All India Institute of Speech and Hearing

ISSN


ISSN

Vol 34 No 1 (2015): .
Speech

Benchmark for Speaker Identification using Mel Frequency Cepstral Coefficients on Vowels Following the Nasal Continuants in Kannada

How to Cite
Suman Suresh, & N, H. (1). Benchmark for Speaker Identification using Mel Frequency Cepstral Coefficients on Vowels Following the Nasal Continuants in Kannada. Journal of All India Institute of Speech and Hearing, 34(1), 63-75. Retrieved from http://203.129.241.91/jaiish/index.php/aiish/article/view/843

Abstract

Aim was to obtain the benchmark for speaker identication using Mel Frequency Cepstral Coecients (MFCC) on vowels following the nasal continuants in Kannada language.  articipants chosen were twenty Kannada speaking male neuro-typical adults, in the age range of 20-30 years. Kannada meaningful words (30) with long vowels /a:/, /i:/, /u:/ following the nasal continuants /m/ and /n/ formed the material. Speech Science Lab Work bench, a Semi-Automatic vocabulary dependent speaker recognition software was used to extract MFCC for the trun- cated (PRAAT) long vowels. Results indicated higher percent correct identication for Condition I (live verse live recording). On comparison among the three vowels following the nasal continuant /m/, /i:/ is better  followed by /a:/ and /u:/. Whereas for /n/ the vowel /a:/ is better followed by /i:/ and /u:/. On an average of percentage of correct speaker
identication of three vowels compared between the nasal continuant, the vowels following the nasal /n/ (90%) and /m/ (90%) was similar. Condition II (Mobile verse Mobile) and Condition III (Mobile verse Live) was comparatively poorer than Condition I, thus the benchmark was ob- tained. Discussion concludes that during the transmission of voice signals
through communication channels, the signals are reproduced with errors caused by distortions from the microphone and channel, and acoustical, electromagnetic interferences and noises aect the transmitting signal. Where speech coding algorithms that are part of Global System for Mobile compress speech signal before transmission, reduce the number of bits
in digital representation but at the same time, maintain acceptable quality.  

References

Amino, K., Sugarwa, T., & Arai, T. (2006). Idiosyncrasy of
nasal sounds in human speaker identi cation and their
acoustic properties. Acoustic Science and Technology,
27, 233-235.
Amino, K., Sugarwa, T., & Arai, T. (2006). E ects of the
syllable structure on perceptual speaker identi cation.
The Journal of the Institute of Electronics, Informa-
tion and Communication Engineers (IEICE), 105, 109-
114.
Amino, K., & Arai, T. (2007). E ect of stimulus contents
and speaker familiarity on perceptual speaker identi -
cation. Journal of Acoustical Society of Japan, 28(2),
128-130.
Amino, K. & Arai, T. (2009). Speaker dependent charac-
teristics of the nasals. Forensic Science International,
158(1), 21- 28.
Amino, K., & Osanai, T. (2012). Speaker characteristics
that appear in vowel nasalization and their change over
time. Acoustical Science and Technology, 33(2), 96-
105.
Atal, B. S. (1972). Automatic speaker recognition based on
pitch contours. The Journal of the Acoustical Society of
America, 52, 1687-1697.
Atkinson, J. E. (1976). Inter and intra speaker variability in
fundamental voice frequency. Journal of the Acoustical
Society of America, 60(2), 440-445.
Beigi, H. (2011). Fundamentals of Speaker Recognition.
Springer, New York. ISBN: 978-0-387-77591-3.
Bhattacharjee, U. (2013). A comparative study of Linear
Prediction Cepstral Co-ecient (LPCC) and Mel- Fre-
quency Cepstral Co-ecient (MFCC) features for the
recognition of Assamese Phonemes. International Jour-
nal of Engineering Research and Technology, 2(1), 2278-
0181.
Boersma & Weenink, D. (2009). PRAAT S.1.14 software,
restricted from http://www.goofull.com/au/program/142n35/
speedytunes.html.
Bricker, P.S., & Pruzansky, S. (1976). Speaker recog-
nition: Experimental Phonetics. London: Academic
press.
Chandrika. (2010). The in
uence of hand sets and cel-
lular networks on the performance of a speaker ver-
i cation system. Project of Post Graduate Diploma
in Forensic Speech Science Technology, University of
Mysore.
Dang, J., & Honda, K. (1996). Acoustical modeling of the
vocal tract based on morphological reality: Incorpora-
tion of the paranasal sinuses and the piriform fossa.
Proceedings of 4th Speech Production Seminar, 49-52,
Grenoble.
Deepa, A., & Savithri, S. R. (2010). Re-standardization of
Kannada articulation test. Student research at All In-
dia Institute of Speech and Hearing (Articles based on
dissertation done at AIISH), 8, 53-55.
Fant, G. (1960). Acoustic Theory of Speech Production.
Netherlands: Mouton and Co., `s- Gravenhage, ISBN:
9027916004.
Fururi. S. (1994). An overview of speaker recognition tech-
nology. Proceeding of ESCA (European Speech Commu-
nication Association) Workshop on Automatic Speaker
Recognition, Identi cation and Veri cation, 1-8.
Fujimura, O., & Lindqvist, J. (1971). Sweep-Tone measure-
ments of the vocal tract characteristics. Journal of the
Acoustical Society of America, 49(2), 541-548.
Glass, J. B. (1984). Nasal Consonants and Nasalized Vowels:
An Acoustic Study and Recognition Experiment. Sub-
mitted in Partial Ful llment of the Requirements for the
Degrees of Master of Science and Electrical Engineering
(Massachusetts Institute of Technology).
Glass, J. R., & Zue, V. W. (1985). Detection of nasalized
vowels in American English. In: Proceedings of Inter-
national Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 15691572.
Glass, J. R., & Zue, V. W. (1995). Detection of nasalized
vowels in American English. In the Proceedings of In-
ternational Conference on Acoustics, Speech and Signal
Processing, 15691572.
Glenn. J. W. & Kleiner. N. (1967). Speaker Identi cation
Based on Nasal Phonation. The Journal of the Acousti-
cal Society of America, 43(2), 368-372.
Hasan, R., Jamil, M., Rabbani, G., & Rahman, S. (2004).
Speaker Identi cation using Mel Frequency Cepstral Co-
ecient. 3rd International Conference on Electrical and
Computer Engineering.
Hattori.S., Yamamoto.K., & Fujimura.O. (1958). Nasaliza-
tion of vowels in relation to nasals. Journal of the Acous-
tical Society of America, 30, 267-274.
Hecker, M. H. L. (1971). Speaker recognition: Basic consid-
erations and Methodology. The Journal of the Acousti-
cal Society of America, 49, 138.
Hollien, H. (1990). The Acoustics of Crime: The New
Science of Forensic Phonetics. New York. Plenum
Press.
Hollien, (2002). Forensic Voice Identi cation. San Diego,
CA: Academic Press.
House, A. S., & Stevens, K. N. (1956). Analog studies of the
nasalization of vowels. Journal of Speech and Hearing
Disorders, 22(2), 218-232.
Jakhar, S. S. (2009). Benchmark for speaker identi cation
using Cepstrum. Unpublished project of Post Gradu-
ate Diploma in Forensic Speech Science and technology,
submitted to University of Mysore, Mysore.
Kinnunen, T. (2009). Spectral features for automatic text
independent speaker recognition. Unpublished Thesis,
University of Joensuu, Department of Computer Sci-
ences, Finland.
Kawabara, H. & Sagisaks, Y., (1995). Acoustic character-
istics of speaker individuality: control and conversion.
Journal of Speech Communication, 16, 165-173.
Lakshmi, P., & Savithri. S. R. (2009). Benchmark for
speaker Identi cation using Vector F1 & F2. Proceedings
of the International Symposium, Frontiers of Research
on Speech and Music, 38-41.
Lavner, J. M. D. (1994). Principles of Phonetics. Cam-
bridge: Cambridge University Press.
Markel, J. & Davis, S. (1979). Test independent speaker
recognition from a large linguistically unconstrained
time-spaced data base. IEEE (Institute of Electronics
and Electronic Engineers) Transactions on Acoustics,
Speech, and Signal Processing, 27(1), 74-82.
McGehee, F. (1937). Reliability of Identi cation of Human
Voices. The Journal of General Psychology, 17, 249-
271.
Medha, S. (2010). Benchmark for Speaker Identi cation
using Cepstrum measurement using Text-independent
data. Unpublished project of Post graduate Diploma
in Forensic Speech Science and Technology submitted to
University of Mysore, Mysore.
Naik, J. (1994). Speaker Veri cation over the telephone
network: database, algorithms and performance, as-
sessment,Proceedings of ESCA (European Speech Com-
munication Association) Workshop Automatic Speaker
Recognition Identi cation Veri cation, 31-38.
Nolan, F. (1983). Phonetic bases of speaker recognition.
Cambridge: Cambridge University.
Nolan, F. (1997). Speaker recognition and forensic phonet-
ics, in Hard castle and Laver (eds), 744-67.
Pruthi, T., & Espy-Wilson, C. (2006). An MRI based study
of the acoustic e ects of sinus cavities and its applica-
tion to speaker recognition, Proceedings of Interspeech,
Pittsburgh, 21102113.
Ramya. B. M. (2011). Bench mark for speaker identi ca-
tion under electronic vocal disguise using Mel Frequency
Cepstral Coecients. Unpublished project of Post grad-
uate Diploma in Forensic Speech Science and Technology
submitted to University of Mysore, Mysore.
Rana, M. & Miglani, S. (2014). Performance Analysis of Mel
Frequency Cepstral Co-ecient and Linear Prediction
Cepstral Co-ecient Techniques in Automatic Speech
Recognition. International Journal of Engineering and
Computer Science, 3(8), 7727-7732.
Reynolds. D. A & Rose. R. (1995). Robust text-
independent speaker identi cation using Gaussian Mix-
ture speaker models. IEEE (Institute of Electronics and
Electronic Engineers) Transactions on Speech and Au-
dio Processing, 3, 72-83.
Reynolds. D. A. (2002). An Overview of Automatic Speaker
Recognition Technology. Proceedings in IEEE (Insti-
tute of Electronics and Electronic Engineers), 4072-
4075.
Ridha Z.A. (2014). Benchmark for speaker identi cation
using Nasal Continuants in Hindi in Direct Mobile and
Network Recording. Unpublished Dissertation of AIISH
(All India Institute of Speech and Hearing). Submitted
to The University of Mysore.
Rose, P. (2002). Forensic Speaker Identi cation. Taylor and
Francis, London.
Soong. F., Rosenberg. A., Rabiner. L., & Juang. B.
H. (1985). A vector quantization approach to speaker
recognition. Proceedings in the International Confer-
ence on Acoustic Signal Processing, 387-390.
Sreedevi, N. (2012). Frequency of occurrence of Phonemes in
Kannada. Project funded by AIISH (All India Institute
of Speech and Hearing) Research Fund (ARF).
Sreevidya (2010). Speaker Identi cation using Cepstrum in
Kannada Language. Unpublished project of Post Grad-
uate Diploma in Forensic Speech Science and Technology
submitted to University of Mysore, Mysore.
Su, K. P. Li., & K. S. Fu (1974). Identi cation of speakers by
use of nasal co-articulation. The Journal of the Acous-
tical Society of America, 56(6), 1876-1882.
Stevens, K. N. (1956). Speaker authentication and identi-
cation: A comparison of spectrographic and auditory
presentations of speech material. The Journal of the
Acoustic Society of America, 44, 1596-1607.
Stevens, K. N. (1971). Sources of inter and intra speaker
variability in the acoustic properties of speech sounds,
Proceedings 7th International Congress, Phonetic Sci-
ence, Montreal, 206-227.
Thompson, C. (1987). Voice Identi cation: Speaker Identi-
cability and correction of records regarding sex e ects.
Human Learning, 4, 19-27.
Tiwari. V. (2010). Mel- Frequency Cepstral Co-ecient and
its applications in speaker recognition Dept. of Elec-
tronics Engineering, Gyan Ganga Institute of Technol-
ogy and Management, Bhopal, (MP) India.
Vasan, M., Mathur, S., & Dahiya, M. S. (2015). E ect of
di erent recording devices on forensic speaker recogni-
tion system. Paper presented in 23rd All India Forensic
Science Conference, Bhopal.
Wolf, J. J. (1972). Ecient acoustic parameter for speaker
recognition. The Journal of the Acoustical Society of
America, 20442056.