The present invention relates to access of information over a wide area network such as the Internet. More particularly, the present invention relates to web enabled recognition allowing information and control on a client side device to be entered using a variety of methods.
Small computing devices such as personal information managers (PIM), devices and portable phones are used with ever increasing frequency by people in their day-to-day activities. With the increase in processing power now available for microprocessors used to run these devices, the functionality of these devices are increasing, and in some cases, merging. For instance, many portable phones now can be used to access and browse the Internet as well as can be used to store personal information such as addresses, phone numbers and the like.
In view that these computing devices are being used for browsing the Internet, or are used in other server/client architectures, it is therefore necessary to enter information into the computing device. Unfortunately, due to the desire to keep these devices as small as possible in order that they are easily carried, conventional keyboards having all the letters of the alphabet as isolated buttons are usually not possible due to the limited surface area available on the housings of the computing devices.
To address this problem, there has been increased interest and adoption of using voice or speech to provide and access such information, particularly over a wide area network such as the Internet. Published U.S. patent application, U.S. 2003/0130854, entitled APPLICATION ABSTRACTION WITH DIALOG PURPOSE and U.S. patent application entitled APPLICATION ABSTRACTION WITH DIALOG PURPOSE having Ser. No. 10/426,053 and filed Apr. 28, 2003 describe a method and system defining controls for a web server to generate client side markups that include recognition and/or audible prompting.
Each of the controls perform a role in the dialog. For instance, controls can include prompt object used to generate corresponding markup for the client device to present information to the user, or generate markups for the client device to ask a question. An answer control or object generates markup for the client device so that a grammar used for recognition is associated with an input field related to a question that has been asked. If it is unclear whether or not a recognized result is correct, a confirmation mechanism can be activated and generate markup to confirm a recognized result. A command control generates markup that allows the user to provide commands, which are other than the expected answers to a specific question, and thus, allows the user to navigate through the web server application, for example. A module, when executed such as on a client, creates a dialog to solicit and provide information as a function of the controls.
The module can use a control mechanism that identifies an order for the dialog, for example, an order for asking questions. The controls include activation logic that may activate other controls based on the answer given by the user. In many cases, the controls specify and allow the user to provide extra answers, which are commonly answers to questions yet to be asked, and thereby, cause the system to skip such questions since such answers have already been provided. This type of dialog is referred to as “mixed-initiative” since the system and the user have some control of dialog flow.
However, when users are allowed to provide many pieces of information in one sentence, it becomes difficult to ensure that the system will respond appropriately. For example, suppose a system asks a user for a phone number. In this example, the phone number includes an area code, a local number and an extension. In a mixed-initiative dialog, the user could provide the full number or just a portion of it. The system may need to confirm portions of the number that have been given and would need to ask for the remaining portions of the number. If the user denies or corrects a portion that the system misunderstood, the system would need to ask it again. Ideally, the system would make sure to always confirm or ask a question about the portions of the number that the user just provided. In contrast, if the system were to confirm or ask a question about another portion of the number, the dialog would seem confusing and hard to follow. Given the large number of possible dialog flows, which can be based on the number of permutations due to the number of extra answers that can be provided, a logical dialog flow is difficult to achieve. In some cases, the system may follow a hard-coded path through the dialog and appears from the user's point-of-view, to ignore the information it was given. However, it is usually processed later, which can further add to the confusion.
There is thus an ongoing need to improve upon the methods used to provide speech recognition in an application such as server/client architecture such as the Internet. In particular, a method, system or authoring tool that addresses one, several or all of the foregoing disadvantages and thus provides generation of speech-enabled recognition and/or speech-enabled prompting in an application is needed.