Applications today generally constitute two types of services: voice applications and data applications. Voice applications are conventionally accessed using a telephone, for example. Data applications are conventionally accessed using a graphical user interface, such as, for example, a web browser.
Referring to FIG. 1, a legacy telephony service system 100 for accessing voice applications is shown. As shown, system 100 includes a telephone 102, the public switch telephone network (PSTN) 104, and voice applications, such as, for example, an integrated voice response application 106 and a voice service 108. Generally, a user would dial the desired voice application, such as a weather service, using telephone 102. The voice application may provide a menu of choices using an audio signal that a user selects using various touch-tone buttons on telephone 102. Based on the user's choices, the voice application may provide local weather for Boston, Mass. Some voice applications include access to an automatic speech recognition process, which allows the user to speak responses instead of using the touch-tone buttons.
Using the PSTN 104, the user accesses the voice application using the circuit-switch network and a wire (or wireless) telephone connection. PSTN 104 accesses the voice applications using a physical voice line, also. Input to and output from the voice applications occurs over the voice line, and the systems do not provide any data interface to the service.
The legacy telephony service system 100 has several drawbacks. In particular, the legacy telephony service system 100 has five major drawbacks. First, conventionally proprietary hardware hosts the integrated voice response applications 106 and voice services 108. The proprietary hardware can map audio data between voice circuits and digital audio circuits. The proprietary hardware inhibits interaction between diverse systems. Second, switching between the PSTN and the network components is difficult and complex. This difficult is due, in large part, to the difference between the binary circuit-switched protocols, such as, for example, ISDN, CAS, or SS7, of the PSTN to the packet based protocols of the network components. Third, access to the application and services must be tailored to the application programming interfaces (API) provided by the hardware vendor, which are not standardized. Fourth, packet based devices cannot access the service because they cannot interface with the PSTN network. Fifth, packet based devices cannot access resources. such as speech recognition or text-to-speech processors.
Referring now to FIG. 2, a conventional packet based data network 200 is shown. As will be explained, data network 200 resolves some of the drawbacks associated with legacy system 100. Data network 200 includes the telephone 102, PSTN 104, and voice service 208, such as, for example, automatic speech recognition (ASR) or text-to-speech processors. In order to convert signals between PSTN and a packet based voice service 208, a media gateway 210 is interspersed between PSTN 104 and packet based voice service 208. An exemplary media gateway is disclosed in the Related Applications which are identified above. Similar to legacy system 100 above, data network 200 vertically integrates the voice services. Thus, access to, for example, speech recognition or text-to-speech services is limited to programming APIs through the voice service runtime.
Even with the mentioned drawbacks, the switch from legacy system 100 to Voice over IP (VoIP) data network 200 addressed some of the legacy system issues. In particular, media gateway 210 provides a generic or standard interface between voice circuits and digital audio circuits allowing use of diverse off the shelf hardware. Also, media gateway 210 can provide signal conversion from PSTN protocols to packet system protocols, such as, for example conversion from ISDN•PR1 to SIP.
While initial data networks 200 solved two basic issues, the vertical integration of data network 200 still required access to voice services through a single vendor API, which required specific programming to the vendors API protocols. Standardize API programming tool such as VoiceXML and Speech Application Language Tags (“Salt”) have improved interoperability between diverse vendors. But because the runtime vendor control access to the media resources, interoperability with media resources between vendors is generally not available.
A component server architecture 300 shown in FIG. 3 addresses the vertical integration issue. In particular, architecture 300 includes access to the media gateway 210, which is typically via a telephone, but as described in the Related Applications, can be a number of devices. Media gateway 210 converts the request into a packet based request, such as a SIP packet, and directs the request to a voice service runtime interface 312. Voice service runtime interface 312 is connected to voice services, such as, for example, speech recognition resource 314, text-to-speech resource 316, and streaming media resource 318.
In this case, because the media path is not along a physical voice line (wireless or wire based), the user input audio can be streamed directly to speech recognition resource 314. Moreover, the audio output of text-to-speech resource 316 can be streamed directly to the user (in this case through media gateway 210 to provide the packet based to switched circuit conversions).
Locating voice service runtime interface 312 as a separate component allows seamless interoperation with diverse media resources. The seamless operation is allowed because the vendor specific APIs are abstracted by the component framework of system 300. The following example demonstrates the abstracting of vendor specific APIs. Assume both vendor X and vendor Y provide text-to-speech resources. Vendor X uses a C-based API and vendor Y uses a Java based API. Front loading vendor X and vendor Y's resource with, for example, a SIP/RTP agent allows vendor X's resource and vendor Y's resource to be used without regard to the specific API because the request is send in a SIP/RTP protocol that is converted to the appropriate API by the front loaded SIP/RTP agent. Thus, each resource and service in the network can be provided by different vendors without regard for the vendor specific API, which allows a best of breed approach to deploying services.
In other words, each vendor specific component is provided with an access agent of sorts that interfaces between the network and the component. The access agent converts a packet based standard protocol to the vendor specific protocol for the component.
Even though the component based architecture solves many drawbacks, conventional systems do not adequately provide solutions for the last two major drawback of legacy system 100. Thus it would be desirous to develop a data access to speech service bridge that would allow packet based devices to access the services, and packet based devices to access resources, such as speech recognition and text-to-speech resources.