The present invention relates to speech recognition and more particularly to improved methods for storing and accessing pre-calculated distance metrics useful in comparing an input utterance with a vocabulary word model.
As speech recognition systems have developed to handle increasingly larger vocabularies, one technique which has evolved is the increasing use of pre-calculated similarity or distance measurements for use in the incremental comparing of an input utterance with a model, e.g. a hidden Markov model, representing a vocabulary word. For example, the input utterance is first converted to a sequence of input data frames e.g. representing spectral energy distributions, and each raw input frame is converted to the closest matching one of a set of standard or prototype data frames in a process which is frequently referred to as vector quantization (VQ). Similarly, the word models are represented by respective sequences of standard or prototype states, e.g. probability distribution functions (pdf) in the case of hidden Markov models.
This use of standard or prototype input data frames and standard or prototype word model states allows pre-calculation of a distance metric for each possible combination of prototype input data frame with prototype model state. However, as vocabularies have grown larger, it has become necessary to increase the range of selection of both possible input data frames and prototype model states in order to provide the resolution and precision of calculation necessary to discriminate between similar sounding words. For example, for a speech recognition system having a vocabulary in the order of 20,000 words, it is advantageous to provide in the order of 1,000 standard or prototype input data frames and 2,000 standard or prototype model states. Accordingly, a complete table or matrix of pre-calculated distance or similarity metrics would comprise over 2,000,000 entries. Further, in order to fulfill its function, this table of pre-calculated distance metrics should be resident in the directly accessible or random access memory (RAM) of the processor which does the comparison calculations. It will be understood by those skilled in the art, this represents a substantial demand on system resources.
Among the several objects of the present invention may be noted the provision of a system and method for reducing the memory space required to store a table of pre-calculated distance metrics; the provision of such a system which facilitates accurate comparisons of input utterances with vocabulary word models; the provision of such a system which is very accurate and which is of relatively simple and inexpensive implementation. Other objects and features will be in part apparent and in part pointed out hereinafter.