A speech application is one of the most challenging applications to develop, deploy and maintain in a communications (typically telephony) environment. Expertise required for developing and deploying a viable application includes expertise in computer telephony integration (CTI) hardware and software, speech recognition software, text-to-speech software, and speech application logic.
With the relatively recent advent of voice extensive markup language (VXML) the expertise require to develop a speech solution has been reduced somewhat. VXML is a language that enables a software developer to focus on the application logic of the voice application without being required to configuring underlying telephony components. Typically, the developed voice application is run on a VXML interpreter that resides on and executes on the associated telephony system to deliver the solution.
As is shown in FIG. 1A (prior art) a typical architecture of a VXML-compliant telephony system comprises a voice application server (110) and a VXML-compliant telephony server (130). Typical steps for development and deployment of a VXML enabled IVR solutions are briefly described below using the elements of FIG. 1A.
Firstly, a new application database (113) is created or an existing one is modified to support VXML. Application logic 112 is designed in terms of workflow and adapted to handle the routing operations of the IVR system. VXML pages, which are results of functioning application logic, are rendered by a VXML rendering engine (111) based on a specified generation sequence.
Secondly, an object facade to server 130 is created comprising the corresponding VXML pages and is sent to server 130 over a network (120), which can be the Internet, an Intranet, or an Ethernet network. The VXML pages are integrated into rendering engine 111 such that they can be displayed according to set workflow at server 110.
Thirdly, the VXML-telephony server 130 is configured to enable proper retrieval of specific VXML pages from rendering engine 111 within server 110. A triggering mechanism is provided to server 110 so that when a triggering event occurs, an appropriate outbound call is placed from server 110.
A VXML interpreter (131), a speech recognition text-to-speech engine (132), and the telephony hardware/software (133) are provided within server 130 and comprise server function. In prior art, the telephony hardware/software 130 along with the VXML interpreter 131 are packaged as an off-the-shelf IVR-enabling technology. Arguably the most important feature, however, of the entire system is the application server 110. The application logic (112) is typically written in a programming language such as Java and packaged as an enterprise Java Bean archive. The presentation logic required is handled by rendering engine 111 and is written in JSP or PERL.
An enhanced voice application system is known to the inventor and disclosed in the U.S. patent application entitled “Method and Apparatus for Development and Deployment of a Voice Software Application for Distribution to one or more Application Consumers” to which this application claims priority. That system uses a voice application server that is connected to a data network for storing and serving voice applications. The voice application server has a data connection to a network communications server connected to a communications network such as the well-known PSTN network. The communication server routes the created voice applications to their intended recipients.
A computer station is provided as part of the system and is connected to the data network and has access to the voice application server. A client software application is hosted on the computer station for the purpose of enabling users to create applications and manage their states. In this system, the user operates the client software hosted on the computer station in order to create voice applications through object modeling and linking. The applications, once created, are then stored in the application server for deployment. The user can control and manage deployment and state of deployed applications including scheduled deployment and repeat deployments in terms of intended recipients.
In one embodiment, the system is adapted for developing and deploying a voice application using Web-based data as source data over a communications network to one or more recipients. The enhanced system has a voice application server capable through software and network connection of accessing a network server and Web site hosted therein and for pulling data from the site. The computer station running a voice application software has control access to at least the voice application server and is also capable of accessing the network server and Web site. An operator of the computer station creates and provides templates for the voice application server to use in data-to-voice rendering. In this aspect, Web data can be harvested from a Web-based data source and converted to voice for delivery as dialogue in a voice application.
In another embodiment, a method is available in the system described above for organizing, editing, and prioritizing the Web-based data before dialog creation is performed. The method includes harvesting the Web-based data source in the form of its original structure; generating an object tree representing the logical structure and content type of the harvested, Web-based data source; manipulating the object tree generated to a desired hierarchal structure and content; creating a voice application template in VXML and populating the template with the manipulated object tree; and creating a voice application capable of accessing the Web-based data source according to the constraints of the template. The method allows streamlining of voice application deployment and executed state and simplified development process of the voice application.
A security regimen is provided for the above-described system. The protocol provides transaction security between a Web server and data and a voice portal system accessible through a telephony network on the user end and through an XML gateway on the data source end. The regimen includes one of a private connection, a virtual private network, or a secure socket layer, set-up between the Web server and the Voice Portal system through the XML gateway. Transactions carried on between the portal and the server or servers enjoy the same security that is available between secure nodes on the data network. In one embodiment, the regimen further includes a voice translation system distributed at the outlet of the portal and at the telephone of the end user wherein the voice dialog is translated to an obscure language not that of the users language and then retranslated to the users language at the telephone of the user.
In such as system where templates are used to enable voice application dialog transactions, voice application rules and speech recognition data are consulted for the appropriate content interpretation and response protocol so that the synthesized voice presented as response dialog through the voice portal to the user is both appropriate in content and hopefully error free in expression. The database, therefore is optimized with vocabulary words that enable a very wide range of speech covering many different vocabulary words akin to many differing business scenarios.
It has occurred to the inventor that the speech recognition and voice rendering functions of the system can be further optimized both in speed of performance and accuracy synthesized dialog instances by providing some form of vocabulary management for the different use embodiments.
What is clearly needed is an enhanced voice management system and method that limits the speech recognition to only vocabulary and rules options provided in conjunction with Web harvesting. Such a management system would enable the speech recognition portion of the system to cooperate to improve speech recognition by dynamically adapting the managed directory for each interaction step by interacting with the application logic and/or the database resource adapter.