Electronic systems employ interfaces to interact with the systems. The interfaces range from a variety of contact and contactless techniques, such as keyboards, buttons, pointer devices, and the like. In recent time, these interfaces may even employ cameras capable of detecting an image or video associated with a user's action.
Recently, audio-based interfaces have become more common. A recording device processes sound, with the sound being interpreted via various digital signal processing techniques to correlate to a textual-based command. Once a text-based command is identified, the exact text can be then employed to perform a control operation associated with the electronic system.
In these techniques, various problems often occur that frustrate the employment of text-based technologies. For example, the clarity of the text-based inputs may be compromised due to environmental conditions, such as, weather and background sound. As such, an electronic system employing an audio recording input mechanism may not be able to interpret the text-based command with the accuracy required to properly interact with the electronic system.
In other cases, the employment of an audio-based interface may also be frustrated by variations in pronunciation, accents, dialects, and vocal clarity. As such, employing an audio-based interface may also be frustrated based on the variations noted above, as the audio-based interface may be incapable of providing an accurate detection of the spoken command.
To address the above, the spoken command may be augmented with disambiguation. Disambiguation detects the spoken command, and provides a list of options associated with the spoken command. The list of options may be presented based on what the processor associated with the detection predicts the spoken command to be, and a predetermined number of other commands related to the predicted command.
FIG. 1 illustrates an example of a system 100 employing disambiguation according to the prior art. FIG. 2 illustrates an example of a user 200 employing the system 100 employing known disambiguation schemes implemented in the prior art.
The voice recognition system 100 is coupled to a data store 150. The data store 150 may be integrated into the system 100, but is shown separately to illustrate certain concepts, such as applying a score data 156 to each of the detected words 155.
The voice recognition system 100 is configured to receive a voice command 110. As shown in FIG. 2, user 200 vocalizes the command ‘Run Program’ 210 via a microphone 250 (or any electronic device capable of receiving vocal commands). As such, command 110 may be input to system 100 for further processing.
The voice recognition system 100 is configured to process the received voice command 110 and retrieve detected words 155 based on said processing. The detection may employ known digital signal processing techniques commonly employed.
In addition to retrieving the detected words 155, the voice recognition system 100 also retrieves a score (contained in score data 156) associate with each of the detected words 155. Based on the voice recognition scheme employed, each of the detected words 155 may be assigned a score based on a prediction of whether the detected word matches the voice command.
The ranked detected words 111 may be communicated to a secondary system (such as a display 190) to display the detected words 155. The ranked detected words 111 order the detected words 155 in a manner corresponding to the score data 156. Thus, the detected words that correlate to a higher score may be displayed first.
This function is shown in FIG. 2. The display 190 includes a window 290 displaying the ranked detected words 111 based on a command 210. As shown, the screen of disambiguation is presented based on a hypothetical application of voice recognition system 100, where each of the words shown in window 290 are associated with a score (based on sample a lookup table 211 is provided).
As shown, even though ‘run’ is vocalized by user 200, the voice recognition system 100 determines that the user 200 may be saying the three options shown in lookup table 211. While for a user with good vocal clarity and/or in an ideal audio situation, the first and highest scored detected word may be the right one, with other users, this may not be the case.
Thus, employing the voice recognition system 100 described above, the user 200 may vocalize the command 211 ‘Run Program’ and receive a window 290 showing the disambiguation shown (Run′, ‘Stun’, and ‘Fun’). After which, the user may select through another input technique or method which the correct one of the options shown in window 290 corresponds to user 200's initial vocal command.