The invention relates generally to speech recognition, and more specifically, to the graphical representation of a confidence value of an associated speech recognition result.
With the growth of speech recognition capabilities, there is a corresponding increase in the number of applications and uses for speech recognition. Different types of speech recognition application and systems have been developed, based on the location of the speech recognition with respect to the user. One such example is a local or embedded speech recognition engine, such as a SpeechToGo speech recognition engine, sold by Speech Works International, Inc., 695 Atlantic Avenue, Boston, Mass., 02111. Another type of speech recognition engine is a network-based speech recognition engine, such as Speech Works 6, as sold by Speech Works International, Inc., 695 Atlantic Avenue, Boston, Mass., 02111.
Embedded or local speech recognition engines provide the added benefit of speed in recognizing a speech input, wherein a speech input includes any type of audible or audio-based input. A drawback of embedded speech or local speech recognition engines is that these engines contain a limited vocabulary. Due to memory limitations and system processing requirements, in conjunction with power consumption limitations, embedded or local speech recognition engines provide recognition to only a fraction of the audio inputs which would be recognizable by a network-based speech recognition engine.
Network-based speech recognition engines provide the added benefit of an increased vocabulary, based on the elimination of memory and processing restrictions. Although, a downside is the added latency between when a user provides a speech input and when the speech input may be recognized and provided back to the user for confirmation of recognition. In a typical speech recognition system, the user provides the audio input and the audio input is thereupon provided to a server across communication path, whereupon it may then be recognized. In another embodiment, the audio input may also be provided to the embedded speech recognition engine.
A problem arises when a recognized result includes a plurality of recognized terms, wherein each of the plurality of recognized terms has an associated confidence value within a predetermined threshold range. It is important to provide the user the list of recognized terms that fall within the predetermined threshold range, such that the user may select the appropriately recognized term. Furthermore, within a device having a limited amount of display, there is a need for an efficient way of displaying the recognized results and their associated confidence values so the user is provided with automatic and direct feedback of the speech recognition. While there exists systems that provide the generated N-best list to the end user in order of the recognition confidence values, with a limited amount of display space, there does not exist systems which may provide for non-alphanumeric symbols representing the associated confidence levels. For instance, a typical speech recognition result list may include the list of terms numbered in order, but does not provide any indication of the difference between the confidence levels of the various terms. It is beneficial for providing an end user with the recognition result list having an associated representation of recognition results such that the user may better understand the associated capabilities of the speech recognition engines. Moreover, in a display area having a very limited amount of display space, it is also extremely difficult to provide a visual indication of the speech recognition list and the associated confidence values, and the difference between each of the terms of the associated list.