The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
In speech processing such as but not limited to speech recognition and speech synthesis, a reoccurring problem is measuring the similarity of two given speech units, e.g. phones, words. Although acoustic models for speech units have taken many forms, one particularly useful form is the Hidden Markov Model (HMM) acoustic model, which describes each speech unit statistically as an evolving stochastic process. Commonly, Gaussian Mixtures Models, which are flexible to fit various spectrums as continuous probability distributions, are widely adopted as a default standard for the acoustic models.
Kullback-Leibler Divergence (KLD) is a meaningful statistical measure of the dissimilarity between probabilistic distributions. However, problems exist in order to perform a KLD calculation to measure the acoustic similarity of two speech units. One significant problem is caused by the high model complexity. Actually, the KLD between two Gaussian mixtures cannot be computed in a closed form, and therefore, an effective approximation is needed. In statistics, KLD arises as an expected logarithm of the likelihood ratio, so it can be approximated by sampling based Monte-Carlo algorithm, in which an average over a large number of random samples is generated. Besides this basic sampling method, Gibbs sampling and Markov Chain Monte Carlo (MCMC) can be used, but they are still too time-consuming to be applied to many practical applications.
KLD rate has also been used as a calculation between two HMMs. However, the physical meaning of KLD and KLD rate are different. KLD rate measures the similarity between the steady-states of the two HMM processes, while KLD compares the two entire processes. In speech processing, the dynamic evolution can be more important than the steady-states, so it is necessary to measure KLD directly. Nevertheless, a closed form solution is not available when using Gaussian Mixture Models (GMMs).