Field of the Art
The present invention relates to telecommunication and a networked computer telephony system, and more particularly to a system and method for providing a telephony-enabled service via a message-based API interface.
Discussion of the State of the Art
Two major telecommunication networks have evolved worldwide. The first is a network of telephone systems in the form of the Public Switched Telephone System (PSTN). This network was initially designed to carry voice communication, but later also adapted to transport data. The second is a network of computer systems in the form of the Internet. The Internet has been designed to carry data but also increasingly being used to transport voice and multimedia information. Computers implementing telephony applications have been integrated into both of these telecommunication networks to provide enhanced communication services. For example on the PSTN, computer telephony integration has provided more functions and control to the POTS (Plain Old Telephone Services). On the Internet, computers are themselves terminal equipment for voice communication as well as Serving as intelligent routers and controllers for a host of terminal equipment.
The Internet is a worldwide network of IP networks communicating under TCP/IP (Transmission Control Protocol/Internet Protocol) suite. Specifically, voice and other multimedia information are transported on the Internet under the VoIP (Voice-over-IP) protocol.
The integration of the PSTN and the IP networks allows for greater facility in automation of voice applications by leveraging the inherent routing flexibility and computing accessibility in the IP networks.
An example platform for easy deployment of telephony applications is described in U.S. Pat. No. 6,922,411, which entire disclosure is incorporated herein by reference. Essentially, a networked telephony system allows users to deploy on the Internet computer telephony applications associated with designated telephone numbers. The telephony application is easily created by a user in XML (Extended Markup Language) with predefined telephony XML tags (e.g. VoiceXML) and easily deployed on a website. The telephony XML tags include those for call control and media manipulation. A call to anyone of these designated telephone numbers may originate from anyone of the networked telephone system such as the PSTN (Public Switched Telephone System), a wireless network, or the Internet. The call is received by an application gateway center (AGC) installed on the Internet. Analogous to a web browser, the AGC provides facility for retrieving the associated XML application from its website and processing the call accordingly.
This type of telephony platform allows very powerful yet simple telephony applications to be built and deployed on the Internet. The following are some examples of the telephony applications deployed on this platform. A “Follow me, find me” application sequentially calls a series of telephone numbers as specified by a user until one of the numbers answers and then connects the call. Otherwise, it does something else such as takes a message or sends e-mail or sends the call to a call center, etc. In another example, a Telephonic Polling application looks up from a database the telephone numbers of a population to be polled. It then calls the numbers in parallel, limited only by the maximum number of concurrent sessions supported, and plays a series of interactive voice prompts/messages in response to the called party's responses and records the result in a database, etc. In another example, a Help Desk application plays a series of interactive voice prompts/messages in response to the called party's responses and possibly connects the call to a live agent as one option, etc. In yet another example, a Stock or Bank Transactions application plays a series of interactive voice prompts/messages in response to the called party's responses and conducts appropriate transactions with a backend database or web application, etc.
The latter examples are generally referred to as self-help applications. In the voice context, a self-help application is referred to as IV R. IVR refers to Interactive Voice Response and is a technology that automates interaction with telephone callers. Enterprises are increasingly turning to IV R to reduce the cost of common sales, service, collections, inquiry and support calls to and from their company.
IVR solutions enable users using voice as a medium or other form of inputs through a voice channel to retrieve information including bank balances, flight schedules, product details, order status, movie show times, and more from any telephone. Additionally, IV R solutions are increasingly used to place outbound calls to deliver or gather information for appointments, past due bills, and other time critical events and activities.
FIG. 1 illustrates schematically a communication application environment. The communication application environment 10 includes one or more client interacting with a communication application server 200 in an application platform 100. The application platform 100 hosts an application specified by an application script 210 coded in object-oriented software. The communication application server 200 includes a browser 220 for interpreting and executing the application script 210. The execution of the application script invokes one or more serverside components 310 in the application server 200. Among the clients and the communication server, these components 310 provide services for call control, media control with one or more media server 230 and interactions with back-end systems 240 such as databases, and business logic and legacy systems such as CRM (Customer Relationship Management) and ERP (Enterprise Resource Planning). One example of the platform is to host an IVR application which interacts with voice, text messaging and other clients in a multi-channel environment.
The communication application platform provides a third-party call control between any numbers of clients 20, 22, 30. The application script 210 defines the communication application 300 and directs how a call is to be handled. For example, when a user makes a call through a voice client such as a handset 20 or a VoIP phone 22 to the IVR, the voice application script 210 associated with the call number is retrieved. The browser 220 executes or renders the retrieved voice application script to allow the user to interact with the voice application 300.
Communication of Multimedia information among endpoints and a third-party call controller generally require call control and media control.
FIG. 2A to FIG. 2C illustrate various call scenarios among a number of clients such as VoIP phones 22 or endpoints and a communication application server 200.
FIG. 2A illustrates a client in the form of a VoIP phone calling the communication application server. For example, the communication application server 200 hosts an IVR and the VoIP phone 22 calls the IVR. Call signaling and media are exchanged between the VoIP phone 22 and the application server 200.
FIG. 2B illustrates a first VoIP phone calling a second VoIP phone. As a third-party call controller, the application server 200 controls the call between the first and second phones. A call connection is established between the first phone 22-1 and the application server 200. Another call connection is established between the second phone 22-2 and the application server 200. The two calls are then joined at the application server to allow the first phone to talk to the second phone. In this scenario, media can be handled in one of two modes. In the bridged mode, media exchanged between the two phones are routed through the application server. In the direct mode, the media is exchanged directly between the two phones.
FIG. 2C illustrates three phones in conference. In this scenario, each phone establishes a call to the application server. The three calls are then joined or mixed at the application server to provide a conference facility.
For call control, a number of protocol standards have been put forward for interoperability. For example, the H.323 standard is a protocol standard recommended by the ITU (International Telecommunication Union) for signaling and call control of IP telephony.
An increasingly popular alternative to the H.323 standard for call control is SIP (“Session Initiation Protocol”.) SIP is an IETF (Internet Engineering Task Force) protocol for signaling and call control of IP telephony and multimedia communication between two or more endpoints. It is text-based and more web-centric and is a comparatively simpler and more light-weight alternative to H.323.
In the traditional web paradigm, a user agent in the form of a client machine running a web browser makes a request to a web server. The web server returns a response to the request. The communication is taking place under the HTTP (Hypertext Transfer Protocol). Specifically, the web browser requests a web resource such as a web page as specified by an URL from a web server. Typically the web server responds by returning the requested web page. The web page may contain text content with embedded instructions for the browser to render the text in the web page. In more sophisticated applications, a web page is often generated dynamically by employing server-side programs and may incorporate content as queried results from backend databases. Thus, some of the content are not hard-coded on the web page but are generated and rendered dynamically by the web server. The server-side programs may also serve to post data from the client to the backend databases.
Traditionally, these server-side programs are implemented as scripts conforming to the CGI protocol (Common Gateway Interface). The CGIs are code modules that perform the task on the web server to generate and render dynamic content or perform other backend functions.
However, CGI has several disadvantages. First, it is not very portable, as different web serving machines with different processors and operating systems may require their own versions of scripts. Secondly, it does not use the server resource efficiently. The different GCIs are run in a different process context than the server which starts them. There is the overhead of creating a new process for each request and the different processes do not have access to a common set of server resources.
The JAVA™ servlet model addresses the disadvantages of the CGI. Servlets are modules written in the highly portable JAVA™ programming language as they run in the same virtual JAVA machine, which is independent of the processor hardware or the operating system. In the objected-oriented Java programming language, the HTTP requests are parsed and made to interact with software objects modeled on the real objects that operate with the application. Similarly, the responses are made to conform with the HTTP protocol before being sent to the requester. Servlets runs in a multi-tread environment in the Java server and allows each request to be handled by a separate tread. Also one instance of the Java scripts need be loaded into the processor memory as compared to CGI where contemporaneous requests require multiple copies of the CGI scripts to be loaded. The original servlets conform to the HTTP protocol and may be regarded as “HTTP servlets”. The servlet model provides a set of API (Application Programming Interface) that is implemented by loading a corresponding servlet container in the application server. The servlet model enables developers to rapidly develop applications and to port them to different servers and be able to run them efficiently. It is widely used in web applications and is based on open standards.
The API is an abstraction that describes an interface for the interaction with a set of functions used by the components. It is a list containing the description of a set of functions that is included in a library and that address a specific problem. In the current context of Java object oriented languages, it comprises a description of a set of Java class definitions and extension class definitions with a set of behaviors associated with the classes. The API can be conceived as the totality of all the methods publicly exposed by the classes (the class interface). This means that the API prescribes the methods by which one handles the objects derived from the class definitions.
For call control, a SIP servlet has been developed and established as a standard to handle requests under the SIP protocol, just as the HTTP servlet handles requests under the HTTP protocol.
FIG. 3A illustrates an existing implementation of the call control objects of the serverside components of the communication application shown FIG. 1 being implemented as SIP servlets. The call control objects are in the form of SIP servlets 320. This is possible through the implementation of a SIP servlet container 340 and a SIP servlet call control API 350.
The SIP Servlet Specification (JSR 289) is a container based approach (modeled on the HTTP servlet paradigm) to developing communication applications utilizing the Session Initiation Protocol (SIP) protocol. A SIP servlet is a Java programming language server-side component that perform SIP signaling. SIP servlets are managed by a SIP servlet container, which typically is part of a SIP-enabled application server. SIP servlets interact with clients by responding to incoming SIP requests and returning corresponding SIP responses. SIP servlets are built of the generic servlet API provided by the Java Servlet Specification which is established as an open standard by the Java Community Process (SM) Program through the Java Specification Request (JSR) process.
Using a SIP servlet (JSR 289) for call control is to leverage the benefits of the servlet model. It also provides a Java API independent of underlying media server control protocols.
U.S. Pat. No. 7,865,607 B2 discloses a servlet model for media rich applications. The SIP servlet for call control is augmented by a media control API. However, the media control API is custom and does not conform to the servlet model.
For media control, media control objects are being supported by a standards-based media control API, JSR 309 as shown in FIG. 3A. Thus, media server specifics are handled by a JSR 309 Driver, allowing an application developer to program using the JSR 309 API, independent of the media server vendor. In this way, the applications can work with different media servers that are deployed by different operators and service providers.
Thus, an application developer can develop components of a communication application in terms of low level call control objects and API in the form of a SIP Servlet based on the open standards JSR 289 and in terms of low level media control objects and API in the form of the open standards JSR 309.
One disadvantage of working with low level and generic objects and their APIs is that the developer has to repeatedly deal with low level details even if many of these details are irrelevant when the object being modeled is in certain states.
FIG. 3B illustrates how the existing implementation of the application has to deal with every event under the standard call control and media control API shown in FIG. 3A. For example, the SIP servlet receives a BYE request to end a call. It examines what state it is in to act according. In the case when it is still in a “CONNECTED” state, it will call the doBYE method to end the connection and perform related call teardown and cleanup tasks. However, a user may decide to hang up a call even before the call connection is established. In that case, it is not even in the “CONNECTED” state and therefore given the state, there was no need for the servlet to receive the BYE request and to perform any call teardown tasks. Nevertheless, in the current implementation, every time the BYE request is received, the servlet will have to check against its state and act accordingly. Thus, the added burden of checking and dealing with irrelevant requests becomes part of the application code. The same is true for the media events and the application has to furnish the logic and additional codes to deal with events which may not be applicable to the current state.
It is desirable for an application to be developed without having to deal with details irrelevant to the object model being dealt with. Furthermore, it is desirable to have a systematic and uniform way of working with call control and media control events, without having to deal with their low level details in the application so as to have succinct and efficient codes.
FIG. 1 shows a server architecture in which the script processing or scripting is performed by the server that is also executing the resultant execution codes. The scripting is language- and protocol-specific, such as for example, processing a script written in the Java or JavaScript language.
However, increasingly users and application developers are using other light-weight protocols and languages to code the application scripts. These include Ruby, Python, Groovy and PHP. With a growing range of languages and protocols, it is difficult for a hosting facility to provide compatible browsers for each of the possible programming languages and protocols.
Even if a large number of browsers is supported, the resultant execution codes from these different browsers will all run in the same Java virtual machine of the application server. Without a standard protocol to the unified API, the different scripts running in the same virtual machine may contend with each other, resulting in poor performance, memory leaks and, worst still, object collisions. Also, having to support a wide set of possible scripts make resource provisioning and budgeting in the communication platform difficult and indefinite.
Thus, there is a need to provide a more flexible arrangement for telephony services and communication application deployment to be driven by scripts coded with a variety of user preferred programming languages and protocols without the above-mentioned disadvantages.