This invention relates to interactive voice services, and more particularly to interactive voice services provided over a combination of a voice network and a computer network such as the Internet.
In traditional interactive voice response (IVR) systems, an end user at an audio terminal, such as a telephone set, interacts over the public switched telephone network (PSTN) with an IVR system, such as a CONVERSANT(copyright) system available from Lucent Technologies, Inc. During the progress of a call, the end user provides audio or touch-tone inputs in response to queries or prompts outputted by the IVR system over the PSTN, as for example when a user identifies himself by name and/or the input of an ID or PIN code through touch-tone or voice. The IVR system, using a combination speech recognition techniques and standard techniques for detecting dual tone multi-frequency (DTMF) touch-tone inputs, is able to interpret the end user""s responses. The queries and the expected audio or touch-tone inputs from the end user follow a xe2x80x9cscriptxe2x80x9d programmed into the IVR system in accordance with the service being provided by the proprietor of the system. The general population is familiar interacting with such systems, which are used, as example, for banking transactions, telephone catalog sales, etc. With such systems, when the end user completes an interactive session through one IVR system and wishes to engage in a next interactive session with a different IVR system that may or may not be associated with the first system, he terminates the first call and then initiates a second telephone call from his telephone set over the PSTN to the second IVR system. When the second call is answered, the end user may need to again identify himself in some manner, and then proceed with the session with the second IVR system. Thus, the end user initiates each successive IVR session over the PSTN through separate independent telephone calls, at each of which he is likely to need to identify himself to the IVR system by means of an ID code and/or PIN number, through speech recognition or other mechanism. If during an interactive session with a first IVR system, transfer to a second separate, but associated, IVR system is required, such as from a customer service department to a sales department, the service provider must effect the transfer of the call with the concomitant expense of the second call.
In the last several years, the use of the Internet as a means of transporting information to and from users has grown in leaps and bounds. Typically, computers equipped with browser programs, such as the popular Netscape(copyright) Navigator or Microsoft(copyright) Explorer browsers, provide a graphical user interface which allows the computer user to interact with web servers connected on the Internet or other Internet Protocol (IP) computer network. With such browser programs, the computer user, by inputting a web server""s Uniform Resource Locator (URL) code, establishes a virtual connection over the Internet to that web server. Via hypertext markup language (HTML)-formatted pages that are transmitted to the user and displayed on the computer""s monitor, a user is able to interact with a provider of goods, services or information. By clicking on a hyperlink or by inputting a new URL code, the user""s computer is quickly connected to retrieve another page from the same or a different web server.
Techniques for extending Internet access to the still large number of end users who do not have a computer and are equipped only with a telephone or other similar audio interface device have been developed and described in, for example, International Application Published Under the Patent Cooperation Treaty (PCT), Publication Number WO 97/40611 entitled xe2x80x9cMethod and Apparatus For Information Retrieval Using Audio Interfacexe2x80x9d, published Oct. 20, 1997 and claiming a priority date of Apr. 22, 1996 based on a co-pending U.S. patent application Ser. No. 08/635,801 to M. A. Benedikt, D. A. Ladd, J. C. Ramming, K. G. Rehor (co-inventor herein), and C. D. Tuckey; D. L. Atkins, T. Ball (co-inventor herein), T. R. Baran, M. A. Benedikt, K. C. Cox, D. A. Ladd, P. A. Mataga (co-inventor herein), C. Puchol, J. C. Ramming, K. G. Rehor (co-inventor herein), and C. D. Tuckey, xe2x80x9cIntegrated Web and Telephone Service Creationxe2x80x9d, Bell Labs Technical Journal, pp. 19035, Winter 1997; and U.S. patent application Ser. No. 09/168,405, filed Oct. 6, 1998 to M. K. Brown, K. G. Rehor (co-inventor herein), B. C. Schmidt and C. D. Tuckey entitled xe2x80x9cWeb-Based Platform for Interactive Voice Response (IVR)xe2x80x9d. A phone markup language (PML) that can be used for web-based voice interactive services is described by J. C. Ramming in xe2x80x9cPML: A Language Interface to Networked Voice Response Unitsxe2x80x9d, Workshop on Internet Programming Languages, ICCL ""98, Loyola University, Chicago, Ill., May, 1998. All four of these references are incorporated by reference herein. On Mar. 2, 1999, the Wall Street Journal reported joint cooperation by ATandT, Motorola and Lucent Technologies on a voice extensible markup language that allows end users to access the Internet by voice. That language is expected to become a standard for defining voice commands to the Internet and is likely to incorporate many aspects of the aforenoted PML.
As described in the aforenoted references, an end user at an audio terminal, such as a telephone, can access interactive services on an IP network through a system that acts as an adjunct that interfaces the PSTN voice network and the IP network such as the Internet or other wide area or local area computer network. In particular, this system, referred to hereinafter as a telephone/IP adjunct or server, functions to enable end users to engage in interactive services via their telephone set with web servers connected on such a wide area or local area network. The telephone/IP server, as described in the references, is embodied as hardware and software on a general purpose computer that together perform the functions of audio play and record, text-to-speech synthesis, DTMF (touch-tone) recognition, automatic speech recognition (ASR), and other call control functions necessary for interactive audio services. The telephone/IP server functions to accept inputs from the telephone end user as speech or DTMF signals, and act as a proxy browser for that end user in making requests over the Internet to the web servers that provide the IVR services with which the end user wishes to interact. Whereas the language format between a browser on an end user""s client terminal and a web server is conventionally the hypertext markup language (HTML), the telephone/IP server and the web servers providing the IVR services communicate using a modification of HTML, the phone markup language (PML) described in the aforenoted article by J. C. Ramming. As noted, PML will be supplanted in the future with the expected-to-be standardized voice extensible markup language.
The telephone/IP server includes the necessary interpreter middleware that interacts with the services on the web server to interpret dialogs to be carried out with the end user. Such dialog interpretation involves coordination of the lower-level audio processing necessary to interact with the end user, and communication of the results of a dialog with the end user to the IVR service on the web server that specified it.
A dialog includes information to be presented to the end user, and may specify information to be collected from the end user. It is, in effect, an audio xe2x80x9cformxe2x80x9d that is filled out by the end user, using DTMF tones or audio input, and returned to an interactive voice service. A dialog may involve multiple prompts and multiple collections of user inputs. Moreover, the dialog may specify control flow information, if the sequencing of interactions is dependent on what the end user inputs. For example, only a subset of information might be audibly presented to the end user if the user makes choices from a hierarchical menu. Alternatively, it may be necessary to re-prompt the end user when he does not respond or makes an illegal choice or input.
The interpreter within the telephone/IP server thus performs a user interface role only, assisting the end user on the telephone set in navigating through information that is presented audibly, and in xe2x80x9cfilling out a formxe2x80x9d. It, in effect, functions as an audio browser for the service retrieved from the web server providing the IVR service. The interpreter has no access to data other than what is specified in the dialog, and little or no computation is performed on information collected from the end user. Rather, a service logic that runs on the web server processes the data and generates the dialogs.
At each web server that provides an interactive voice service, a service logic is executed that performs the functions of making decisions, data access and storage, computation, and transaction processing that needs to be performed to offer the interactive voice service to the end user. The service logic, however, interacts with the end user only by generating dialogs for the interpreter in the telephone/IP server. The infrastructure used for CGI services on the web is used for communication between the interpreter in the telephone/IP server and the service logic resident on the web server. Thus, HTTP requests and CGI form submissions are used for the retrieval of dialogs and the notification of results.
Dialogs are specified as xe2x80x9cpagesxe2x80x9d of PML, or its equivalent. The PML, or its equivalent, allows a service creator to specify output from audio files and text (via text-to-speech), input fields for digits and spoken information, choices from lists using DTMF and speech recognition grammars, and control flow for the dialog. As HTML pages are, pages of PML, or its equivalent, are textual (they may, however, contain references to non-textual data, such as audio files and compiled grammars, which must be retrieved/cached for dialog processing), and can be static or created dynamically (by CGI execution).
FIG. 1 shows the telephone/IP service architecture that enables an end user of telephone set 101 connected to the PSTN 102 to engage in an interactive voice response session with a service provider who provides a service via a web server 103 connected to IP network 104, such as the Internet, rather than an IVR system connected directly to the PSTN. As an example, if the service provider is a brokerage house whose service provides personalized stock quotes based on an individual""s portfolio, the end user at telephone 101 calls that brokerage house""s 800 number associated with that service. That call is routed as a circuit switched voice call over PSTN 102 to the telephone/IP server 105, which is connected to the PSTN network 102, but may be geographically located anywhere. Telephone/IP server 105 is also connected to IP network 104. Upon answering the incoming telephone call, telephone/IP server 105, running interpreter 106, uses the called number to access a URL from its database (not shown) that identifies the first dialog in the service associated with that called number. This URL is used in a TCP/IP HTTP request transmitted over IP network 104 to the particular web server 103 running the service logic 107 corresponding to the stock quoting service. Web server 103 responds to the request with a PML page. This PML page is transported over IP network 104 back to telephone/IP server 105, and is interpreted by interpreter 106, causing a welcoming message to be played and prompting the end user for input of an identifier such as a user name and PIN. That information, received from the end user at telephone set 101 over the PSTN by the server 105 is returned to the web server 103 as an HTTP request that is a CGI form submission. Verification of the PIN takes place on web server 103, and, if verified, the response is another PML page that contains a list of stock quotes that are customized for that end user. That customized PML page is sent back over IP network 104 to telephone/IP server 105 which converts the received PML page to audio format for transmission over the PSTN 102 to the end user at telephone set 101. While listening to the list, the end user may be able to barge in to request a particular stock quote for another stock.
With the telephone/IP server-mediated interactive voice service, the end user may not and need not know that the service is being provided through a web server 103 connected to the Internet 104 rather than through a traditional IVR system connected to the PSTN. Thus, the dialogs presented to the end user through the telephone/IP server appear to the end user to have no different audible characteristics than the dialogs presented during a session with a traditional IVR system connected to the PSTN.
The telephone/IP server 105 is not specialized for the particular service provided by web server 105 but rather is a generic resource capable of interpreting dialog markup in the form of PML pages on behalf of any interactive voice service embodied on a web server. Disadvantageously, in accordance with the prior art telephone/IP architecture, in order to successively access a second separately defined interactive voice service (which may just be a separately configured service associated with the provider of the first service) either the end user must place a second telephone call for that second service, or the service provider must bear the expense of placing a separate voice call to the second IVR system. Thus, for example, continuing the illustrative stock quote service above, if the end user wants to place an order to buy or sell a stock, which service is not embodied on the dialogs created by the service logics 107 running on web server 105 but rather is embodied on a separate web server, he must hang up and place a new call to a different 800 number to initiate a separate interactive voice session with that brokerage house""s IVR stock transaction service. Furthermore, the end user must again go through an identification procedure by identifying himself through name and PIN, or some alternative manner.
In accordance with the present invention, a transfer capability is provided to enable an end user who is connected via his telephone set to a first web-based IVR service to transfer to a second separately configured web-based IVR service without placing an additional telephone call, and wherein information associated with the end user""s transaction with the first service is transferred to the second service. Specifically, while interacting in an IVR session in a first service through a telephone/IP server, the end user may be audibly presented with the ability to transfer to a specific second service. That second service may be totally distinct from the first service, or may be related to the first service, such as a different department of that first service provider, but which second service is configured with a service logic on a web server separate from the service logic providing the first service. That transfer option is communicated to the end user during a dialog in the first web-based IVR service, which dialog is defined on a PML-formatted page having a hyperlink to the URL address associated with the second service. In response to an end user""s input, which may be communicated by means of a verbal answer or a touch-tone input in response to a question posed during the dialog, the interpreter running on the telephone/IP server recognizes the user""s input and, by means of an TCP/IP HTTP request, establishes a connection to the web server running the second IVR service at the URL indicated by the hyperlink. Further, and significantly, in establishing the connection to the web server providing the second IVR service, an information transference takes place that provides information to that second IVR service that is relevant to the end user""s interactive session with the first IVR service. That information transference can take place by means of a cookie, URL encoding, or another information transference mechanism. The information transferred can include the identity of the end user, his PIN, and other information associated with the user and/or the just completed session with the web server during the first IVR service or other past IVR sessions. In this manner, the end user, via a single telephone call that is terminated at the telephone/IP server, is able to effect a seamless transfer to a succession of IVR services which may be running on separate web servers without even realizing that such services are being provided from different sources. Further, each of these separately running services, which may be on different web servers running their own service logics, need not be coordinated with respect to their operating systems, server hardware, tool sets, etc., since the telephone/IP interpreter, interacting with each such service with standardized PML pages, provides seamless interoperability. Therefore, a service provider, providing a plurality of different services, can independently add to or modify the interactive services it provides without concern for the interoperability between each such service.