The present invention relates to access of information on a computing device. More particularly, the present invention relates to allowing the computer user to enter information, for example, in response to a single prompt using any one of a number of techniques such as through speech recognition, through a keypad generating DTMF tones (dual tone multi-frequency), or through handwriting recognition to name just a few.
Small computing devices such as personal information managers (PIM), devices and portable phones are used with ever increasing frequency by people in their day-to-day activities. With the increase in processing power now available for microprocessors used to run these devices, the functionality of these devices are increasing, and in some cases, merging. For instance, many portable phones now can be used to access and browse the Internet as well as can be used to store personal information such as addresses, phone numbers and the like. Unfortunately, due to the desire to keep these devices as small as possible in order that they are easily carried, conventional keyboards having all the letters of the alphabet as isolated buttons are usually not possible due to the limited surface area available on the housing or the computing device.
One solution has been to allow the user to enter information through audible phrases and perform speech recognition. In one particular embodiment, speech recognition is used in conjunction with a display. In this embodiment, a user can complete a form or otherwise provide information by indicating the fields on the display that subsequent spoken words are directed to. Specifically, in this mode of data entry, the user is generally under control of when to select a field and provide corresponding information. After selecting a field, the user provides input for the field as speech. This form of entry using both a screen display and allowing free form selection of fields and voice recognition is called “multi-modal”.
Although speech recognition is quite useful, there arise situations where a user may not want to audibly provide the information. For instance, the content of the information could be confidential, and the user can be in a public environment, where he/she does not wish such information to be overheard. Similarly, if the user is in a noisy environment, errors in speech recognition can easily occur due to background interference. In such situations, it is desirable to allow the user to easily switch between the mode of input. For instance, a user may in such cases prefer to respond via a keyboard or other input device rather than providing spoken commands or phrases.
In addition to the handheld computing devices discussed above, it is also quite common to access information using a simple telephone. In this environment, the user can either provide spoken language or actuate the telephone keypad to generate DTMF tones in response typically to audible prompts rendered through the telephone speaker. Again, this allows the user to choose an input modality that is best suited for the sensitivity of the information provided, and/or the environment in which the information is being provided.
It is also well known that other forms of input modality exist such as handwriting recognition, eye movement to selected areas on a display, gesture and interpretation of other visual responses by a user, to name just a few. Allowing a computer user to use any one of these input modalities at any point in the application improves usability by providing flexibility.
Frameworks have been developed to allow application developers to use different input modalities in an application. Speech Application Language Tags (SALT) is a set of extensions to existing markup languages, particularly HTML and XHTML, that enable multi-modal and/or telephone based systems to access information, applications and/or web services from personal computers, telephones, tablet personal computers and wireless devices. When used in conjunction with a dialog managing mechanism such as Microsoft Speech Server by Microsoft Corporation of Redmond, Wash., an application developer can allow the user to freely select a method of input such as via speech or the use of DTMF generated tones.
Although allowing a computer user to easily select an input modality for any given response improves flexibility, problems still arise. In particular, since it is desirable to allow the user to select the input modality by merely providing speech or depressing a keypad to generate a DTMF tone, the dialog managing mechanism must be prepared to accept input using either modalities. When embodied using SALT techniques, this is accomplished by activating “listen” objects simultaneously for both speech recognition and DTMF recognition. A significant problem arises when, the user has begun depressing keys and a noisy event also occurs in the background. In this situation, the speech recognition mechanism may process what has been heard from the noisy environment and may return a “non-recognition” event, rather than process the input from the DTMF generated tones. Issuance of the non-recognition event coincides with canceling of both the speech recognition and DTMF listen objects.
The application upon receiving the non-recognition event may then prompt the user to speak louder or repeat their verbal instructions. Since the user was in fact trying to enter information using DTMF generated tones through a keypad, the user may be quite confused by these instructions. In addition to possibly confusing the user, the application and the user are now out of sync and the application and the user must come into agreement to enable further processing. Invariably this will take some time.
The present invention provides solutions to one or more of the above-described problems and/or provides other advantages over the prior art.