The invention relates to speech recognition and more particularly to adaptive speech recognition with variable recognition computation.
Computer-based speech-processing systems have become widely used for a variety of purposes. Some speech-processing systems provide Interactive Voice Response (IVR) between the system and a caller/user. Examples of applications performed by IVR systems include automated attendants for personnel directories, and customer service applications. Customer service applications may include systems for assisting a caller to obtain airline flight information or reservations, or stock quotes.
IVR systems interact with users by playing prompts and listening for responses from users. The responses are attempted to be recognized and various actions can be performed in response to recognized speech.
Processors of computers used in IVR systems perform operations to attempt to recognize the user""s speech. The processor can concurrently attempt to recognize speech of several users interacting with the IVR system over separate lines, e.g., telephone lines. The amount of the processing capacity of the processor used can vary as the number of users interacting with the system varies. During peak calling times, the capacity may be nearly fully used, or even completely used. Systems typically are designed to accommodate peak calling times.
In general, in one aspect, the invention provides a speech recognition system including a user interface configured to provide signals indicative of a user""s speech. A speech recognizer of the system includes a processor configured to use the signals from the user interface to perform speech recognition operations to attempt to recognize speech indicated by the signals. A control mechanism is coupled to the voice recognizer and is configured to affect processor usage for speech recognition operations in accordance with a loading of the processor.
Implementations of the invention may include one or more of the following features. The user""s speech includes multiple utterances and the control mechanism is configured to determine the processor loading at a beginning of each utterance.
The control mechanism is configured to determine which category of a plurality of processor loading categories represents current processor loading and to affect processor usage for attempting to recognize speech according to the determined category. There are four categories corresponding to the processor loading being relatively idle, normal, busy, and pegged, wherein the control mechanism is configured to affect computational levels of the processor for recognizing speech such that a computational level of the processor for recognizing speech is set to an idle limit, a normal limit, a busy limit, and a pegged limit when the processor loading is determined to be idle, normal, busy, and pegged respectively, and wherein the idle limit is about twice the busy limit, the normal limit is about 1.5 times the busy limit, and the pegged limit is about 0.8 times the busy limit. The processor is configured to perform speech recognition operations in accordance with stored instructions that include recognition parameters that affect the computational level of the processor, wherein sets of recognition parameters correspond to the processor computational level limits, and wherein the control mechanism is configured to select a set of the recognition parameters according to the determined processor loading. The recognition parameters correspond to at least one of a fast-match threshold, across word pruning, and short-list depth.
In general, in another aspect, the invention provides a method of adaptive speech recognition, the method including receiving indicia of speech, setting speech recognition accuracy parameters in accordance with loading of a processor configured to perform speech recognition operations, and using the set speech recognition parameters to perform the speech recognition operations to attempt to recognize the speech using the received indicia.
Implementations of the invention may include one or more of the following features. The speech includes multiple utterances and wherein the setting occurs at a beginning of each utterance.
The method further includes determining which category of a plurality of processor loading categories represents processor loading at a given time and wherein the setting sets the recognition parameters, affecting processor usage for attempting to recognize speech, until processor loading is again determined. There are four categories corresponding to the processor loading being relatively idle, normal, busy, and pegged, wherein the setting sets the recognition parameters such that potential computational loading of the processor for recognizing speech is about twice, about 1.5 times, and about 0.8 times, the potential computational loading of the processor for speech recognition when the processor is determined to be busy if the processor is determined to be idle, normal, and pegged, respectively. The recognition parameters correspond to at least one of a fast-match threshold, across word pruning, and short-list depth.
In general, in another aspect, the invention provides a computer program product, residing on a computer readable medium, including instructions for causing a computer to: receive indicia of speech, set speech recognition accuracy parameters in accordance with loading of a processor of the computer, and use the set speech recognition parameters to perform the speech recognition operations to attempt to recognize the speech using the received indicia.
Implementations of the invention may include one or more of the following features. The speech includes multiple utterances and wherein the instructions for causing a computer to set the parameters cause the computer to set the parameters at a beginning of each utterance.
The computer program product further includes instructions for causing the computer to determine which category of a plurality of processor loading categories represents processor loading at a given time and wherein the instructions for causing the computer to set the recognition parameters cause the computer to set the recognition parameters, affecting processor usage for attempting to recognize speech, until processor loading is again determined. There are four categories corresponding to the processor loading being relatively idle, normal, busy, and pegged, wherein the instructions for causing the computer to set the recognition parameters cause the computer to set the recognition parameters such that potential computational loading of the processor for recognizing speech is about twice, about 1.5 times, and about 0.8 times, the potential computational loading of the processor for speech recognition when the processor is determined to be busy if the processor is determined to be idle, normal, and pegged, respectively. The recognition parameters correspond to at least one of a fast-match threshold, across word pruning, and short-list depth.
In general, in another aspect, the invention provides a speech recognition system including an input configured to receive signals indicative of speech. A processor is configured to read instructions stored in memory and to perform operations indicated by the instructions in order to recognize the speech indicated by the received signals. The system also includes means for adjusting a speech recognition computational amount of processor as a function of availability of the processor.
Implementations of the invention may include one or more of the following features. The adjusting means adjusts the computational amount in accordance with the availability of the processor at a beginning of an utterance of the speech indicated by the received signals. The adjusting means adjusts the computational amount in accordance with the availability of the processor only at a beginning of an utterance of the speech indicated by the received signals. The adjusting means adjusts the computational amount to one of a first level, a second level, a third level, and a fourth level, respectively corresponding to four ranges of load as a percentage of processor capacity, the first level having a maximum computational amount of about twice a maximum computational amount of the third level, the second level having a maximum computational amount of about 1.5 times the maximum computational amount of the third level, and the fourth level having a maximum computational amount of about 0.8 times the maximum computational amount of the third level.
Various aspects of the invention may provide one or more of the following advantages. Peak periods of speech recognition system use can be accommodated and accuracy of speech recognition may be improved in non-peak periods compared to peak periods. Improved usage of processing capacity compared to current techniques may be achieved. Adaptive speech recognition accuracy for processor load changes as well as information for accurate offline simulations are both provided for. Speech recognition accuracy may be adjusted during a user""s call. System crashes due to fixed processing assumptions can be reduced and/or avoided. Transaction completion rates can be improved versus fixed speech recognition computation systems.