1. Statement of the Technical Field
The present invention relates to the field of computer software and speech recognition and more particularly to user-navigated dynamic voice portals that use speech recognition technology.
2. Description of The Related Art
Contrary to visual applications, voice-based applications have the problem that for input recognition no strict pattern matching can be used. The nature of speech recognition makes it very difficult to distinguish between terms having similar pronunciations. Therefore, during the design of speech applications, care should be taken to provide input choices which are pronounced as differently as possible, so as to avoid the problem of recognizing the wrong choice.
The problem of recognizing the wrong input choice in a speech recognition application occurs with voice portals, which are generally built by various parties that may not be aware of the terms used in the various applications disposed within the voice portal. Often, a voice portal will have, in addition to the current grammars (or commands) for the actual choice to be made, additional active grammars, such as certain “universal” grammars that allow a user to navigate through the portal, e.g. a command such as “go back.” Thus, at any given moment, a combined set of grammars are active, and the voice recognition engine has to search in the set of combined active grammars for a match.
A problem arises if the various grammars used across the various applications on the portal are designed by different parties, as is the case for voice portals built on a general portal architecture, such as the IBM WebSphere™ Portal Server. General portal architecture allows for new applications to be added dynamically by an administrator. The new added choices created by each new application modify the available choices in a selection menu, and thereby affect the quality of recognition. Generally, the administrators are not voice technology specialists, and may further have to operate a voice portal in multiple languages. Because of this, there is always a risk that a new voice application may drastically reduce the quality of the portal.
FIG. 1 depicts an example of a sample content and organization of a voice portal. The user is generally presented with a tree 10, into which, after logging into the portal, the user starts at a home directory 11. The tree then divides into new sub-directories 12 and 14, for “Business” and “Entertainment”, respectively. At home directory 11, the user would be presented with two choices, for “Business” or “Entertainment,” which would be the current grammars for the choice that the portal would need to recognize. In addition to those current grammars, there may be additional active grammars, such as “go back” or “quit.” As the user navigates deeper into the menu 10, the current grammars may change from one menu selection step to another. After the “Places” menu selection step 60, the user would proceed to the “Pages” step 65, and would be presented with a new set of menu options 16, 17, 81, and 19, labeled “Information,” Notes, “Directory,” and “Sports,” respectively. The new menu options would be added to the set of active grammars.
Below these menu options are the various portlets or voice applications in the applications phase 70 at the bottom of the menu. Applications 20, 22, 24 each branch off from menu item 16, while applications 40, 42, and 44 each branch off from menu item 18. The two sets of voice applications may have been written and arranged by different parties not knowing which terms the other party used for the title of each application. Within each branch of applications additional grammars would be added to the active set which the speech recognition engine of the portal must recognize.
In menu 10, it can be seen that application 34 is titled “Directory,” which is the same as menu option 18. If the grammar for selecting menu option 18 is active within the selection choice following menu option 17, then the system would have trouble distinguishing between identically pronounced terms. Similarly, if a universal grammar such as “store settings” was also active, this would present recognition problems if the user were to navigate through menu item 18, which has the application named “Stores.”
Currently, the only way of testing a portal's recognition quality after setting up the portal or installing a new voice application (or portlet) is to call into the system and check manually, or by user testing with a human user, how well the system works. This can be time-consuming and expensive. It would be desirable therefore, to provide a quality evaluation tool that assesses the ability of a voice portal to recognize different terms in the various applications attached to the portal, by analyzing and measuring the similarity of the terms.