(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech recognition, and more particularly, to a method of dynamically providing a user speech access to the data in a voice-enabled program.
2. Description of the Related Art
Speech recognition systems enable computers to understand and extract information from human spoken language. Such systems can function in a complimentary manner with a variety of other computer programs where there exists a need to understand human language. Speech recognition systems can extract relevant information contained within human speech and then supply this information to another computer program or system for purposes such as booking flight reservations, finding documents, or summarizing text.
Currently within the art, many speech recognition systems are implemented as directed dialog systems. Directed dialog speech recognition systems typically prompt or instruct a user as to the proper form of an immediate user response. For example, a directed dialog system can instruct a user as follows xe2x80x9cSay 1 for choice A, Say 2 for choice Bxe2x80x9d. By instructing the user as to the proper format for an immediate user response, the speech recognition system can expect a particular type of speech response. Accordingly, the speech recognition system can process that user response more accurately and function more efficiently.
Directed dialog speech recognition systems commonly serve as interfaces for larger distributed voice applications. VoiceXML is a markup language for distributed voice applications based on extended markup language (xe2x80x9cXMLxe2x80x9d), much as HTML is a markup language for distributed visual applications. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and Dual Tone Multifrequency (xe2x80x9cDTMFxe2x80x9d) key input, recording of spoken input, telephony, and mixed-initiative conversations. Version 1.0 of the VoiceXML specification has been published by the VoiceXML Forum in the document by Linda Boyer, Peter Danielsen, Jim Ferrans, Gerald Karam, David Ladd, Bruce Lucas and Kenneth Rehor, Voice extensible Markup Language (VoiceXML (trademark)) version 1.0, (W3C May 2000), which is incorporated herein by reference. Additionally, Version 1.0 of the VoiceXML specification has been accepted by the World Wide Web Consortium as an industry standard.
Version 1.0 of the VoiceXML specification provides a high-level programming interface to speech and telephony resources for program developers, service providers and equipment manufacturers. As noted in the W3C submission, standardization of VoiceXML will simplify creation and delivery of Web-based, personalized interactive voice-response services; enable phone and voice access to integrated call center databases, information and services on Web sites, and company intranets; and help enable new voice-capable devices and appliances.
As defined in the VoiceXML specification, the xe2x80x9cmenuxe2x80x9d tag provides developers with a standard mechanism for creating speech based menus. Developers can specify a static list of speech menu items which can be presented to a user. The xe2x80x9cchoicexe2x80x9d tag within the xe2x80x9cmenuxe2x80x9d construct allows the developer to specify a phrase that, when spoken by the user, will indicate to the VoiceXML program that a particular menu item has been selected.
Despite the advantages of using the menu facility of VoiceXML, the constructs can lead to an inflexible programming model which forces users to follow a rigid predetermined menu structure. Specifically, during the design phase of a voice-enabled system, developers must determine the overall speech menu structure for navigating that system. Moreover, the developer must determine the individual speech menu items to be included within each speech menu and speech submenu. Finally, once speech menu structures and corresponding speech menu items are determined, the developer must perform the cumbersome task of building the speech menus.
The invention disclosed herein concerns a method for dynamically generating speech menus in a voice-enabled program such that the user can select menu items by speaking the contents of the data. In particular, the invention can use VoiceXML in combination with one or more embedded server-side programs to dynamically generate speech menus within a voice-enabled program such as a voice-enabled Web application. A server-side program, which can be accessed via a computer communications network, can dynamically generate markup language, for example VoiceXML, which can specify speech-enabled menu items in a speech menu. More particularly, the server-side program can access a database having one or more speech menu items stored therein. According to predetermined logic, the server-side program can select one or more data items from the database. Selected data items can be formatted using a voice-enabled markup language in order to specify the speech menu. In this manner, speech menu items can be selected dynamically from a database rather than being statically hard coded into the markup itself. The server-side program can be implemented using any network-centric server-side programming technology, for example, Perl, Active Server Pages, java Server Pages, and the like.
One aspect of the present invention can include a method of dynamically formatting a speech menu construct. The method can include providing a markup language document containing a reference to a server-side program. The server-side program can be programmed to dynamically format data using voice-enabled markup language such as VoiceXML. The method further can include accessing a database using the server-side program where the database can have a plurality of data items. Particular ones of the plurality of data items can be selected and formatted using the voice-enabled markup language thereby creating formatted speech menu items specifying a speech menu construct. Additionally, the method can include generating a speech grammar using the identified particular ones of the plurality of data items, wherein the speech grammar can be used to voice-process menu choices corresponding to the speech menu items in the speech menu construct.
Another aspect of the invention can include a system for generating a speech menu construct. The system can include a voice-enabled markup language document and a server-side program accessible by a reference to the server-side program contained within the voice-enabled markup language document. The server-side program can be programmed to access a database of data items and format selected data items for inclusion within the speech menu construct using a voice-enabled markup language.
Another aspect of the invention can include a machine readable storage, which can be a VoiceXML formatted machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform a series of steps. The steps can include providing a markup language document, which can be a VoiceXML document, containing a reference to a server-side program. The server-side program can be programmed to dynamically format data using voice-enabled markup language such as VoiceXML. The method further can include accessing a database using the server-side program where the database can have a plurality of data items. Particular ones of the plurality of data items can be selected and formatted using the voice-enabled markup language thereby creating formatted speech menu items specifying a speech menu construct. Additionally, the method can include generating a speech grammar using the identified particular ones of the plurality of data items, wherein the speech grammar can be used to voice-process menu choices corresponding to the speech menu items in the speech menu construct.