The service providers of public switching telephone networks (PSTN) have been offering 411-type directory assistance (DA) service via live operators for more than a century. An early step toward automated directory assistance (ADA) was the use of store and forward technology to assist live operators. Early ADA systems use various speech compression and silence removal techniques, known as “Store and Forward”, to shorten the time a live DA operator handles a call. The caller was asked for a locality by a pre-recorded prompt. The store and forward system stored a compressed version of the caller's response to the prompt, and brought a live operator onto the line. The operator heard a compressed version of the response and then completed the remaining dialog with the caller to provide a unique telephone number.
More recently another form of automated directory assistance has been developed, which uses automated speech recognition (ASR) technology to recognize a locality from the caller's response to a prompt. In a typical system, if the speech recognition is successful, the system asks for the listing, puts an operator on the line, populates the operator's workstation display with the recognized locality, and plays a recorded compressed version of the caller's response to the listing question. The operator then conducts the remaining dialog.
Systems have been developed that attempt to carry the speech recognition through the entire dialog of locality, database listing, clarification, and disambiguation. Recognition success rates have increased but are not 100%. The conventional approach to improving the success rate is to “tune” the system by recording callers' responses and using them to expand the speech recognition capability.
In the event of failed speech recognition, the system defaults to a live operator. Conventional approaches automatically hand off the call to a live operator after a failed attempt at speech recognition. Automatic hand off is described in U.S. Pat. Nos. 4,979,206, 5,479,488, and 5,987,414.
The foregoing ADA technologies and related systems were developed to process a portion or the entire duration of a 411 call from a traditional Time Division Multiplex (TDM)-based circuit switched telephone network. In addition to TDM circuit switched networks, however, there are packet switched networks such as the internet. Packet switching has historically been used for data transmission, but it has recently been enhanced to provide voice transmission. “Voice over IP” (VoIP) is a set of technologies that enables voice to be sent over a packet network. Its usage for messaging is expected to explode in the coming years.
Users communicate using VoIP as easily as they do with today's PBXes and public phone networks. By leveraging the existing data network, companies can save significant amounts of money by using VoIP for toll-bypass. It is thought that VoIP will speed the adoption of unified messaging by transmitting voice, fax and e-mail messages. VoIP is also known as IP telephony.
Over the next several years, companies will deploy VoIP in conjunction with 802.11 wireless LANs, enabling workers to have WLAN-based mobile phones in the office.
Session Initiation Protocol (SIP) is the real-time communication protocol for VoIP. It also supports video and instant-messaging applications. SIP performs basic call-control tasks, such as session set up and tear down and signaling for features such as hold, caller ID and call transferring. Its functions are similar to Signaling System 7 (SS7) in standard telephony and H.323 or Media Gateway Control Protocol in IP telephony.
With SIP, most of the intelligence for call setup and features resides on the SIP device or user agent, such as an IP phone or a PC with voice or instant-messaging software. In contrast, traditional telephony or H.323-based telephony uses a model of intelligent, centralized phone switches with dumb phones.
TDM and VoIP networks are expected to co-exist for some time. An important challenge for providing improved ADA, therefore, is to provide a platform having a telephony layer that is capable of interfacing with both TDM, circuit-switched, networks, as well as VoIP, packet-switched, networks.
Another relevant technology for ADA is VoiceXML. VoiceXML is a markup language for creating voice-user interfaces. It uses speech recognition and/or touchtone (DTMF keypad) for input, and pre-recorded audio and text-to-speech synthesis (TTS) for output. It is based on the Worldwide Web Consortium's (W3C's) extensible Markup Language (XML), and leverages the web paradigm for application development and deployment. By having a common language, application developers, platform vendors, and tool providers all can benefit from code portability and reuse.
With VoiceXML, speech recognition application development is greatly simplified by using familiar web infrastructure, including tools and Web servers. Instead of using a PC with a Web browser, any telephone can access VoiceXML applications via a VoiceXML “interpreter” (also known as a “browser”) running on a telephony server. Whereas HTML is commonly used for creating graphical Web applications, VoiceXML can be used for voice-enabled Web applications.
VoiceXML is growing in popularity and effectiveness. VoiceXML-based applications are increasing both in number and in features. For example, one carrier's toll-free directory assistance services 200,000,000 calls per year. Another carrier's VoiceXML system lets customers speak a name or phone number to make a phone call as well as use voice commands to access information services such as stock quotes and sports. The General Motors Onstar™ system includes the Virtual Advisor™, a personalized voice portal complete with financial services, traffic, weather, news, sports, entertainment, and e-mail.
In call centers, VoiceXML is providing an attractive alternative to proprietary IVR solutions to automate the more routine transactions. E-trade's customer service and stock trading automated telephone applications, for example, are both written in VoiceXML. VoiceXML-based utilities for standards-based voice solutions include Customer Relationship Management, Human Resources, and Supply Chain Management.
Most developers confirm VoiceXML is at least three times faster in terms of application development compared to traditional IVR. VoiceXML offers reusable and off-the-shelf applications because it is a W3C standard markup. Traditional or proprietary IVR requires a second silo infrastructure from existing Web infrastructure, whereas VoiceXML does not. VoiceXML's easily integrates with existing application server infrastructure. That is, VoiceXML applications run off of the same servers that Web services run, providing a flexible, distributed architecture, rather than on a “big iron” legacy IVR platform.
The present invention provides an ADA platform architecture to accommodate TDM and VoIP networks at the telephony level, to exploit VoiceXML at the database level, and to use VoIP at the call routing level, to provide improved ADA services.