SantoshKumar, SA and Ramasubramanian, V (2005) Automatic Language Identification using Ergodic-HMM. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, 18-23 March, Philadelphia, PA, USA, Vol.1, 609-612.
Recently, we established the equivalence of an ergodic HMM (EHMM) to a parallel sub-word recognition (PSWR) framework for language identification (LID). The states of EHMM correspond to acoustic units of a language and its state transitions represent the bigram language model of unit sequences. We consider two alternatives to represent the state-observation densities of EHMM, namely, the Gaussian mixture model (GMM) and hidden Markov model (HMM). We present a segmental K-means algorithm for the training of both these types of EHMM (EHMM of GMMs and EHMM of HMMs) and compare their performances on a 6 language LID task in the OGI-TS database. EHMM of GMMs has a performance comparable to PSWR and superior than EHMM of HMMs; we provide reasons for the performance difference between EHMM(G) and EHMM(H), and identify ways of enhancing the performance of EHMM(H) which is a novel and powerful architecture, ideal for spoken language modeling.
|Item Type:||Conference Paper|
|Additional Information:||Copyright 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Department/Centre:||Division of Electrical Sciences > Electrical Communication Engineering|
|Date Deposited:||30 Nov 2007|
|Last Modified:||19 Sep 2010 04:35|
Actions (login required)