The evolution of the conventional public switched telephone network has resulted in a variety of voice applications and services that can be provided to individual subscribers and business subscribers. Such services include voice messaging systems that enable landline or wireless subscribers to record, playback, and forward voice mail messages. However, the ability to provide enhanced services to subscribers of the public switched telephone network is directly affected by the limitations of the public switched telephone network. In particular, the public switched telephone network operates according to a protocol that is specifically designed for the transport of voice signals; hence any modifications necessary to provide enhanced services can only be done by switch vendors that have sufficient know-how of the existing public switched telephone network infrastructure.
An open standards-based Internet protocol (IP) network, such as the World Wide Web, the Internet, or a corporate intranet, provides client-server type application services for clients by enabling the clients to request application services from remote servers using standardized protocols, for example hypertext transport protocol (HTTP). The web server application environment can include web server software, such as Apache, implemented on a computer system attached to the IP network. Web-based applications are composed of HTML (Hypertext Markup Language) pages, logic, and database functions. In addition, the web server may provide logging and monitoring capabilities.
In contrast to the public switched telephone network, the open standards-based IP network has enabled the proliferation of web based applications written by web application developers using web development tools. Hence, the ever increasing popularity of conventional web applications and web development tools provides substantial resources for application developers to develop robust web applications in a relatively short time and an economical manner. However, one important distinction between telephony-based applications and web-based applications is that telephony-based applications are state aware, whereas web-based applications are stateless.
In particular, conventional telephony applications are state aware to ensure that prescribed operations between the telephony application servers and the user telephony devices occur in a prescribed sequence. For example, operations such as call processing operations, voicemail operations, call forwarding, etc., require that specific actions occur in a specific sequence to enable the multiple components of the public switched telephone network to complete the prescribed operations.
The prior art web-based applications running in the IP network, however, are state-less and transient in nature, and do not maintain application state because application state requires an interactive communication between the browser and back-end database servers accessed by the browsers via a HTTP-based web server. However, an HTTP server provides asynchronous execution of HTML applications, where the web applications in response to reception of a specific request in the form of a URL (Uniform Resource Locator) from a client, instantiate a program configured for execution of the specific request, send an HTML web page back to the client, and terminate the program instance that executed the specific request. Storage of application state information in the form of a “cookie” is not practical because some users prefer not to enable cookies on their browser, and because the passing of a large amount of state information as would normally be required for voice-type applications between the browser and the web application would substantially reduce the bandwidth available for the client.
While not considered prior art to the present invention, commonly-assigned, copending application Ser. No. 09/480,485, filed Jan. 11, 2000, entitled “Application Server Configured for Dynamically Generating Web Pages for Voice Enabled Web Applications”, the disclosure of which is incorporated in its entirety herein by reference, discloses an application server that executes a voice-enabled web application by runtime execution of extensible markup language (XML) documents that define the voice-enabled web application to be executed. The application server includes a runtime environment that establishes an efficient, high-speed connection to a web server. The application server, in response to receiving a user request from a user, accesses a selected XML page that defines at least a part of the voice application to be executed for the user. The XML page may describe a user interface, such as dynamic generation of a menu of options or a prompt for a password, an application logic operation, or a function capability such as generating a function call to an external resource. The application server then parses the XML page, and executes the operation described by the XML page, for example, by dynamically generating an HTML page having voice application control content, or fetching another XML page to continue application processing. In addition, the application server may access an XML page that stores application state information, enabling the application server to be state-aware relative to the user interaction. Hence, the XML page, which can be written using a conventional editor or word processor, defines the application to be executed by the application server within the runtime environment, enabling voice enabled web applications to be generated and executed without the necessity of programming language environments.
Hence, web programmers can write voice-enabled web applications, using the teachings of the above-incorporated application Ser. No. 09/480,485, by writing XML pages that specify respective voice application operations to be performed. The XML documents have a distinct feature of having tags that allow a web browser (or other software) to identify information as being a specific kind or type of information. While not considered prior art to the present invention, commonly assigned, copending application Ser. No. 09/501,516, filed Feb. 1, 2000, entitled “Arrangement for Defining and Processing Voice Enabled Web Applications Using Extensible Markup Language Documents”, the disclosure of which is incorporated in its entirety herein by reference, discloses an arrangement for defining a voice-enabled web application using extensible markup language (XML) documents that define the voice application operations to be performed within the voice application. Each voice application operation can be defined as any one of a user interface operation, a logic operation, or a function operation. Each XML document includes XML tags that specify the user interface operation, the logic operation and/or the function operation to be performed within a corresponding voice application operation, the XML tags being based on prescribed rule sets that specify the executable functions to be performed by the application runtime environment. Each XML document may also reference another XML document to be executed based on the relative position of the XML document within the sequence of voice application operations to be performed. The XML documents are stored for execution of the voice application by an application server in an application runtime environment.
Hence, the XML document described in the above-incorporated application Ser. No. 09/501,516, which can be written using a conventional editor or word processor, defines the application to be executed by the application server within the runtime environment, enabling voice enabled web applications to be generated and executed without the necessity of programming language environments.
In reference to a conventional telephony-based application (unlike those in the patent applications incorporated by reference above), a user can use the application to access prerecorded responses from a remote source by using a menu-based audio interface. This prior art interface may be based on simple voice predefined voice commands, like “yes” or “no,” or reciting a number to indicate choice in a menu. The interface may also be based on entering numbered or other responses on a touch tone keypad into the telephone. For example, a user can use a touch tone telephone to access a bank and obtain the balance or other information on a bank account over a telephone. A user can also use a touch tone telephone to obtain information about some topic or organization they are interested in, such as the hours, exhibits, prices, and special events for a museum, based on a menu of prerecorded menus and messages maintained by the museum.
In other conventional approaches, automatic speech recognition (ASR) techniques provide for the recognition of words or phrases in a user's speech. A user can provide speech input into a microphone attached to a computer, and the computer can translate words and phrases in the speech into commands or data that the computer receives as input similar to the way input typed into a keyboard would be used by the computer. Text to speech (TTS) techniques provide for the output of a computer to be translated from text output to speech. Thus the user can hear the output of the computer that, otherwise, would typically be read by the user from a display screen attached to the computer.