The present invention relates to word recognition for a speech recognition system and, more particularly, to word recognition using word templates having a data reduced format.
Typically, speech recognition systems represent spoken words as word templates stored in system memory. When a system user speaks into the system, the system must digitally represent the speech for comparison to the word templates stored in memory.
Two particular aspects of such an implementation have received a great deal of attention. The first aspect pertains to the amount of memory which is required to store the word templates. The representation of speech is such that the data used for matching to an input word typically requires a significant amount of memory to be dedicated for each particular word. Moreover, a large vocabulary causes extensive computation time to be consumed for the match. In general, the computation time increases linearly with amount of memory required for the template memory. Practical implementation in real time requires that this computation time be reduced. Of course, a faster processor architecture could be employed to reduce this computation time, but due to cost considerations, it is prefered that the data representing the word templates be reduced to reduce the computation.
The second aspect pertains to the particular matching techniques used in the system. Most word recognition techniques have been directed to the accuracy of the recognition process for a particular type of feature data used to represent the speech. Typically, channel bank information or LPC parameters represent the speech. When using feature data of a reduced format, the word recognition process must be sensitive to the format for an effective implementation.
The speech recognition system, described herein, clusters frames within the word templates to reduce the representative data, for which a word recognition technique requires special consideration to the combined frames. Data reduced word templates represent spoken words in a compacted form. Matching an incoming word to a reduced word template without adequately compensating for its compacted form will result in degraded recognizer performance. An obvious method for compensating for data reduced word templates would be uncompacting the reduced data before matching. Unfortunately, uncompacting the reduced data defeats the purpose of data reduction. Hence, a word recognition method is needed which allows reduced data to be directly matched against an incoming spoken word without degrading the word recognition process.