Modern speech recognition systems are based on principles of statistical pattern recognition and typically employ an acoustic model and a language model to decode an input sequence of observations (also referred to as acoustic events or acoustic signals) representing an input speech (e.g., a sentence or string of words) to determine the most probable sentence or word sequence given the input sequence of observations. In other words, the function of a modern speech recognizer is to search through a vast space of potential or candidate sentences and to choose the sentence or word sequence that has the highest probability of generating the input sequence of observations or acoustic events. In general, most modern speech recognition systems employ acoustic models that are based on continuous density hidden Markov models (CDHMMs). In particular, CDHMMs have been widely used in speaker-independent LVCSR because they outperform discrete HMMs and semi-continuous HMMs. In CDHMMs, the probability function of observations or state observation distribution is modeled by multivariate mixture Gaussian (also referred to herein as Gaussian mixtures) which can approximate the speech feature distribution more accurately. However, time-consuming output probability computation and large memory requirement of CDHMMs makes it difficult to implement a real-time LVCSR system. One of the ways to reduce memory requirement and computation cost is to build a smaller system by reducing both the number of mixtures per HMM state and the number of HMM states of the system. However, this method usually introduces unacceptable increase of word error rate (WER) if the parameter size is reduced significantly. Other methods are utilized to speed up state likelihood computation but at the expense of recognition accuracy and increasing memory requirements. Thus, there exists a need to reduce memory requirement and computation cost without degrading the system performance and recognition accuracy.