(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and computer apparatus in which a speech user interface can automatically adjust the content of prompt feedback based on predicted recognition accuracy.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry and command and control in a speech user interface. Implementing a usable speech user interface, however, involves overcoming substantial obstacles.
Ironically, the bane of the speech user interfaces can be the very tool which makes them possible: the speech recognizer. Often, it is difficult to verify whether the speech recognizer has understood a speech command correctly. In fact, interacting with a speech recognizer has been compared to conversing with a beginning student of a foreign language: In that instance, because misunderstandings can occur more frequently, each participant in the conversation must continually check and verify, often repeating or rephrasing until understood.
Likewise, error-prone speech recognizers require that the speech recognition system emphasize feedback and verification. Yet, error identification and error repair can be time consuming and tiring. Notably, participants in a recent field study for evaluating speech user interface design issues complained about the slow pace of interaction with the speech user interface, particularly with regard to excessive feedback. Still, in a speech user interface, there is a strong connection between required feedback and recognition accuracy. When the recognition accuracy is high, short prompts incorporating little feedback are appropriate because it is unlikely that the speech recognizer will misunderstand the speech command. In contrast, where recognition accuracy is reduced, longer prompts incorporating significant feedback become necessary.
Speech recognition errors can be divided into three categories: rejection, substitution, and insertion. A rejection error occurs where the speech recognizer fails to interpret a speech command. A substitution error occurs where the speech recognizer mistakenly interprets a speech command for a different, albeit legitimate, speech command. For instance, a substitution error is said to occur where the speech command xe2x80x9csend a messagexe2x80x9d is misinterpreted as xe2x80x9cseventh message.xe2x80x9d Finally, an insertion error can occur when the speech recognizer interprets unintentional input, such as background noise, as a legitimate speech command.
In handling rejection errors, human factors experts seek to avoid the xe2x80x9cbrick wallxe2x80x9d effect which can occur where a speech recognizer responds to every rejection error with the same error message, for instance, the notorious xe2x80x9cAbort, Retry, Failxe2x80x9d message of DOS. In response, human factors experts propose incorporating progressive assistance where a short error message is supplied in response to the first few rejections. Successive rejections are followed by progressively more thorough prompts containing helpful feedback. Progressive assistance, however, operates in response to misrecognitions. Progressive assistance does not anticipate and respond to predicted future misrecognitions.
Though rejection errors merely can be frustrating, substitution errors can have more significant consequences. As one human factors expert notes, if a user submits a voice command to a weather application requesting the weather in xe2x80x9cKuaixe2x80x9d, but the speech recognizer interprets the speech command as xe2x80x9cGood-byexe2x80x9d and disconnects, the interaction will have been completely terminated. Hence, in some situations, an explicit verification of a speech command would seem appropriate. Explicit feedback in the form of verifying every speech command, however, would prove tedious. In particular, where speech commands comprise short queries, verification can take longer than presentation. Consequently, current speech recognition theory acknowledges the utility of implicit feedbackxe2x80x94including part of the recognized speech command in the responsive prompt. Nevertheless, present explicit and implicit feedback verification schemes respond only to occurring errors. In fact, one system only provides implicit feedback for commands involving the presentation of data, and explicit feedback for commands which will destroy data or set into motion future events. Presently, no systems provide feedback according to predicted substitution errors.
Finally, spurious insertion errors can occur primarily in consequence of background noise. Present speech recognizers normally will reject the illusory speech command. On occasion, however, speech recognizers can mistake the illusory speech command for an actual speech command. Typical solutions to spurious insertion errors focus upon preventing the error at the outset. Such methods involve suspending speech recognition in the presence of heightened background noise. Still, existing systems fail to anticipate recognition errors in consequence of heightened background noise. These systems will not recognize an increase in background noise and proactively adjust feedback.
The invention concerns a method and computer apparatus for automatically adjusting the content of feedback in a responsive prompt based upon predicted recognition accuracy by a speech recognizer. The method involves receiving a user voice command from the speech recognizer; calculating a present speech recognition accuracy based upon the received user voice command; predicting a future recognition accuracy based upon the calculated present speech recognition accuracy; and, controlling feedback content of the responsive prompt in accordance with the predicted recognition accuracy. For predicting future poor recognition accuracy based upon poor present recognition accuracy, the calculating step can include monitoring the received user voice command; detecting a reduced accuracy condition in the monitored user voice command; and, determining poor present recognition accuracy if the reduced accuracy condition is detected in the detecting step. Detecting a reduced accuracy condition, however, does not always correspond to detecting a poor accuracy condition. Rather, a reduced accuracy condition can be a condition in which recognition accuracy is determined to fall below perfect recognition accuracy.
For detecting a reduced accuracy condition, the detecting step can include obtaining a confidence value associated with the received user voice command from the speech recognizer; identifying as a reduced confidence value each confidence value having a value below a preset threshold; and, triggering a reduced accuracy condition if the reduced confidence value is identified in the identifying step. Another detecting step can include storing the user voice command in a command history; calculating a sum of user undo voice commands stored in the command history; and, triggering a reduced accuracy condition if the sum exceeds a preset threshold value. One skilled in the art will recognize, however, that the undo voice command can include any number of synonymous user commands interpreted by the system to indicate a user command to return to the previous command state. Similarly, the detecting step can comprise storing the user voice command in a command history; calculating a sum of user cancel voice commands stored in the command history; and, triggering a reduced accuracy condition if the sum exceeds a preset threshold value. As in the case of the undo command, one skilled in the art will recognize that the cancel command can include any number of synonymous user commands interpreted by the system to indicate a user command to disregard the current command.
Yet another detecting step can include comparing the user voice command to a corresponding computer responsive prompt; determining whether the voice command is inconsistent with the corresponding computer responsive prompt; identifying the user voice command determined to be inconsistent with the corresponding computer responsive prompt as an unexpected user voice command; storing the unexpected user voice command in a command history; calculating a sum of unexpected user voice commands stored in the command history; and, triggering a reduced accuracy condition if the sum exceeds a preset threshold value. Finally, the detecting step can comprise obtaining measured acoustical signal quality data from the speech recognizer; and, triggering a reduced accuracy condition if the measured acoustical signal quality evaluates below a preset threshold value.
In the presence of a reduced accuracy condition, the generating step can comprise providing the received user voice command as a computer responsive prompt. Another generating step can comprise incorporating the received user voice command as part of a subsequent computer responsive prompt. Yet another generating step can include providing a request for confirmation of the received voice command as a computer responsive prompt. The generating step also can comprise: concatenating the received user voice command with a subsequent computer responsive prompt; and, providing the concatenation as a single computer responsive prompt. Finally, the generating step can comprise providing a list of high-recognition user voice commands as a computer responsive prompt.
It will be appreciated, however, that the present invention can also predict future adequate recognition accuracy based upon high accuracy conditions detected in the monitored user voice command. Thus, one skilled in the art will recognize that the calculating step could include: monitoring the received user voice command; detecting a high accuracy condition in the monitored user voice command; and determining good present recognition accuracy if the high accuracy condition is detected in the detecting step. Responsive to predicted adequate recognition accuracy, the present invention could reduce prompt feedback in the computer responsive prompt. In particular, longer prompts would prove unnecessary in a high accuracy recognition environment.