Most systems for recognizing speech employ some means of reducing the data in raw speech. Thus the speech is reduced to representations that include less than all of the data that would be included in a straight digitization of the speech signal. However, such representations must contain most if not all of the data needed to identify the meaning intended by the speaker.
In development, or "training", of the speech-recognition system, the task is to identify the patterns in the reduced-data representations that are characteristic of speech elements such as words or phrases. The sounds made by different speakers uttering the same words or phrases are different, and thus the speech-recognition system must assign the same words or phrases to patterns derived from these different sounds. There are other sources of ambiguity in the patterns, such as noise and the inaccuracy of the modeling process, which may also alter the speech signal representations. Accordingly, routines are used to assign likelihoods to various mathematical combinations of the reduced-data representations of the speech, and various hypotheses are tested, to determine which one of a number of possible speech elements is most likely the one currently being spoken, and thus represented by a particular data pattern.
The processes for performing these operations tend to be computation-intensive. The likelihoods must be determined for various data combinations and large numbers of speech elements. Thus the limitation on computation imposed by requirements of, for instance, real-time operation of the system limit the sensitivity of the pattern-recognition algorithm that can be employed.
It is accordingly an object of the present invention to increase the computational time that can be dedicated to recognition of a given pattern but to do so without increasing the time required for the total speech-recognition process.
It is a further object of the invention to process together signal segments corresponding to a longer time period, that is, use a larger signal "window," without substantially increasing the computational burden and without decreasing the resolution of the signal data.