This application relates to storing model parameters for models that have a large number of parameters. In particular, the application relates to scaling and quantizing model parameters.
Pattern recognition systems are often based on Hidden Markov Models where each model includes three connected model states. During recognition, a frame of an input signal is converted into a multi-dimensional feature vector. This feature vector is then applied to each of the HMM states to determine a likelihood of each state producing the feature vector. In many systems, the likelihood for each state is determined using a mixture of Gaussians. Thus, the feature vector is applied to each Gaussian in the mixture to determine a likelihood and the weighted sum of these likelihoods is the likelihood for the state. In terms of an equation, the likelihood for an HMM state is defined as:
                                                        b              j                        ⁡                          (                              o                i                            )                                =                                                    ∑                                                                                              m                =                1                            M                        ⁢                                                  ⁢                          c              jm                        ⁢                          1                                                                                                                                            (                                                      2                            ⁢                            π                                                    )                                                n                                                                                    ⁢                                          ∑                      jm                                                                                                              ⁢                                                  ⁢                          ⅇ                                                -                  0.5                                *                                                      (                                                                  o                        t                                            -                                              μ                        jm                                                              )                                    ′                                ⁢                                                      ∑                    jm                                          -                      1                                                        ⁢                                                                          ⁢                                      (                                                                  o                        t                                            -                                              μ                        jm                                                              )                                                                                      ⁢                                                      EQ        .                                  ⁢        1            where bj(0t) is the state emission likelihood for state j, cjm is the mixture weight for mixture component m, n is the number of dimensions in the input feature vector 0t, Σjm is the covariance matrix for the mth mixture component, and μjm is the mean for the mth Gaussian.
In many systems, it is assumed that the covariance matrix Σjm is a diagonal matrix. This allows each summand to be calculated as a simple sum in the log domain. This results in:
                                          b            j                    ⁡                      (                          o              i                        )                          =                              LogSum            m                    (                                          ⁢                                                                 ln                ⁡                                  (                                      c                    jm                                    )                                            -                              .5                *                                  ln                  ⁡                                      (                                                                                            (                                                      2                            ⁢                            π                                                    )                                                n                                            ⁢                                                                        ∏                                                      k                            =                            1                                                    n                                                ⁢                                                                                                  ⁢                                                  σ                          jmk                                                                                      )                                                              ⁢                                                          -                              0.5                ⁢                                                      ∑                                          k                      =                      1                                        n                                    ⁢                                                                          ⁢                                                                                    (                                                                              o                            tk                                                    -                                                      μ                            jmk                                                                          )                                            2                                        *                                          (                                              σ                        jmk                                                  -                          2                                                                    )                                                                                                                              EQ        .                                  ⁢        2            where otk is the kth component of the input feature vector, μjmk is the kth component of the mean feature vector, and σjmk is the kth component along the diagonal of the covariance matrix.
To compute the score in Equation 2, a processor would typically pre-compute the first two terms for each mixture component and store these values in memory since they are not dependent on the input feature vector. Input feature vectors would then be received and the last term would be calculated based on those input feature vectors. Note that the summation in the last term requires for each mixture component a separate mean and variance for each dimension of the input feature vector. For each mixture component, with a forty-dimensional feature vector, this summation requires 80 values to be retrieved to recover the means and variances needed to perform the summation.
In the typical implementation, these values are floating point numbers, typically requiring 4 bytes (32 bits) of storage. Thus, for each mixture component in a state, 40×2×4=320 bytes must be retrieved from memory. Even in simple systems, there may well be over 5,000 mixture components across the different possible HMM states. Thus, to process all of the possible states, 5,000×320=1,600,000 bytes of data must be retrieved from memory for every 10 millisecond frame that is processed.
With the increase in processor speed (but without corresponding reductions in the bandwidth and latency of main memory), the time needed to fetch this large amount of data from memory is the factor that limits the evaluation speed of the recognition system.
To overcome this, the prior art has devised systems for quantizing the model parameter values so that they occupy less space. Under one system of the prior art, the 32-bit values for each parameter are converted into an 8-bit index into a code book. While this reduces the total size of the model by allowing similar parameter values to be represented by a single index, it actually increases the number of times the memory must be accessed during the evaluation of the Gaussians because it requires first the retrieval of the 8-bit index and then the retrieval of the actual parameter value using the index.
In an alternative quantization scheme, the 32-bit parameter values are quantized into 8 or 16-bit fixed-point numbers using a linear scaling technique. To form a quantized value, the 32-bit value is multiplied by a scaling factor that is equal to the ratio of the range of quantized values to the range of possible values for the parameter in the 32-bit domain. This scaled value is then added to an offset, which shifts the value so that the breadth of the parameter values across the different mixtures can be represented by values in the quantized domain.
Such linear quantization reduces the model size and the size of each individual parameter thereby reducing the total time required to retrieve the parameters used in the evaluation from memory into the CPU. However, this quantization decreases the accuracy of the system. In particular, because a large number of floating bit values are assigned to a single quantized value, the quantized value is only an approximation for each 32-bit value and the difference between the approximation and the actual 32-bit value constitutes a quantization error that can distort the evaluation process.
Thus, a system is needed that can reduce the time required to retrieve parameters used in the evaluation of a Hidden Markov Model while limiting the introduction of distortion into the evaluation of the model.