Speech recognition system has been in development for more than 25 years resulting in a variety of hardware and software tools for personal computers. Products and services employing speech recognition are developing rapidly and are continuously applied to new markets.
With the sophistication of speech recognition technologies, networking technologies, and telecommunication technologies, a multifunctional speech-activated communications system, which incorporates TV program service, video on demand (VOD) service, and Internet service and the like, becomes possible.
This trend of integration, however, creates new technical challenges, particularly in the field of navigating or browsing via a speech control interface.
For example, when the system is in the Internet browsing mode, the user could feel disappointed if the system is not responsive to a spoken command which is not very well matched with a button label displayed on a Web page. Therefore, a mechanism for extending speech-activated navigation to another available search engine in certain circumstances is desired.
Another example of technical challenge is that, when the system is in a video on demand (VOD) mode, traditional method of navigating hierarchical menus will no longer meet the efficiency needs. Hierarchical menus are widely used in automated systems that permit users to pick a desired item from a large list. The list for instance could be a list of items for sale, a list of films that may be vended by a video on demand (VOD) system, or some other kind of list.
The use of a hierarchy allows a user to reach a final selection by making a small number of choices among alternatives, perhaps a sequence of three to five such choices, where each intermediate choice narrows the range of list items from which the final selection will be made. For instance, in a video on demand (VOD) system, the range of selections in principle consists of every movie ever filmed, which of course may be a very long list. But if the selection process advances by indicating first a genre, then an actor, and so on, the long list may be navigated quickly. For this reason, hierarchical menus are quite common in graphical user interfaces, touchtone-based interactive telephone systems, and other modes of list selection.
A key drawback of hierarchical menu systems, however, is that they can be tedious and cumbersome to use. In particular, the choices must be made in the order dictated by the designer of the hierarchical system.
What is further desired is a means for alleviating the tedium, through the automatic creation of an automatic speech recognition system and associated grammar(s) and database(s), embodying the same list of selections and selection criteria present in a given hierarchical menu system, but conducted through the medium of the spoken word, and moreover, using modes of statement that are natural and fluent, rather than simply mirroring in words the selections that might be made either with a cursor and graphical display in the case of a graphical user interface, or a telephone keypad in the case of an interactive telephone system.
Another example of technical challenge is that when the system is in a video on demand (VOD) mode, if the user did not speak exactly the button label displayed by the speech control interface or if the input utterance is lower than a pre-set confidence level, the system may fail to recognize the correct command and thus the system would be unable to provide the service that the user requested. For example, in a one-grammar-path-per-title approach, if the user spoke “American President” instead of “The American President”, the user's command would not be mapped to the correct movie “The American President”.
Therefore, a system that can more generously recognize the user's input utterance without sacrificing reliability is further desired.