1. Technical Field
This invention relates to the field of voice extensible markup language, and more particularly to a method and system for generating a grammar rule using a referenced recording.
2. Description of the Related Art
Visual browsers are complex application programs that can render graphic markup languages such as Hypertext Markup Language (HTML) or Extensible HTML (XHTML or XML). As such, visual browsers lack the ability to process audible input and/or output. Still, visual browsers enjoy a significant user base.
Voice browsers are the audio counterparts of visual browsers. More particularly, voice browsers can render voice markup languages such as Voice Extensible Markup Language (VXML or VoiceXML), thereby allowing users to interact with the voice browser using speech.
Recent developments in Web-based applications have led to the development of multimodal interfaces. Multimodal interfaces allow users to access multimodal content, or content having both graphical and audible queues. Through a multimodal interface, the user can choose to interact or access content using graphic input such as a keyboard or pointer entry, using an audible queue such as a speech input, or using a combination of both. For example, one variety of multimodal interface is a multimodal browser that can render content written in XHTML Voice markup language, also referred to as X+V markup language.
Voice-enabling content refers to permitting spoken utterances to be utilized as recognizable application input as well as generating spoken output for an application, such as presenting an audible rendition of content contained within an electronic document like a markup language document. Command and control pertains to graphical user interface (GUI) features such as commands that are accessible through menus and dialog boxes of an application. Content navigation pertains to the ability of a user to select hyperlinks presented within a rendered electronic document using voice, thereby causing a browser, for example, to load the document represented by the hyperlink. Thus, to speech enable an application program, efforts not only must be directed to voice-enabling the content, but also to voice-enabling command and control and content navigation functions of the application program.
VoiceXML uses grammars that can be generated from text data available to an application (through a database, webservices, or user input.) However, there is a class of Automatic Speech Recognition (ASR) applications that record lists of user utterances and creates grammars based on dynamically generated acoustic baseforms. An example is a phone dialer application with a phone book where the user can save the names and numbers for people and dial them later with a command like “Dial Brian.” The VoiceXML language currently fails to support this application model where a grammar can be built with baseforms generated dynamically from user utterances.