1. Technical Field
This invention relates to the field of computer audio user interfaces and more particularly to a system and method for performing secured communications between a voice browser and a server.
2. Description of the Related Art
Privacy in data communications has become a significant issue with the exponential increase in e-commerce transactions on the Internet. Typically a client computer and server computer require that exchanged information remain private to both parties. For instance, in an online banking transaction, the client requires that the sharing of the client's account number and password include only the intended bank and no other party. Presently, privacy in data transactions can be secured only in selected applications protocols through the use of security technologies which can incorporate either asymmetric, symmetric or a combination of asymmetric and symmetric encryption algorithms. The Secured Sockets Layer (“SSL”) protocol represents one such security technology which incorporates both asymmetric and symmetric encryption algorithms.
SSL is a transport-layer protocol that can be established between a client and a server. SSL is typically integrated directly with selected underlying application protocols. For example, the Hypertext Transfer Protocol (“HTTP”) has been successfully integrated with SSL. Specifically, HTTP packets are encapsulated in SSL packets and are transported over TCP/IP. HTTP integrated with SSL is commonly referred to as “HTTPS” and can be used to securely view and exchange Web-based content encoded in hypertext markup language (“HTML”). Other protocols integrated with SSL include Telnet, the File Transfer Protocol (“FTP”), the Lightweight Directory Access Protocol (“LDAP”), the Internet Message Access Protocol (“IMAP”), and the Network News Transfer Protocol (“NNTP”).
SSL is intended to provide a secure pipe between a client and a server. SSL is session-oriented and can maintain state, despite the execution of SSL over such protocols as HTTP which, in of itself, is stateless. Finally, SSL provides privacy through encryption, both asymmetric and symmetric, authentication based upon certificates, a vehicle for authorization through SSL's support for certificates, integrity by incorporating hash functions, and digital signing as part of the transport protocol.
Briefly, in an SSL compliant visual Web browser executing the “HTTPS” protocol, an SSL session can be established when a client selects a uniform resource locator (“URL”) referencing a server compliant with the HTTPS protocol. The server can respond by delivering to the client, an X.509 certificate containing a distinguished name referencing a Certificate Authority (“CA”) and a public key. The client can examine the server certificate by referencing the issuing CA and can verify the integrity of the server certificate if the issuing CA is configured in the visual Web browser as trustworthy. Subsequently, the server can perform optional client authentication by requesting a certificate from the client. The server, too, can examine the client certificate by referencing the issuing CA and can verify the integrity of the client certificate if both the client and the issuing CA are configured in the server as trustworthy. If the server successfully authenticates the client certificate, the SSL session can continue. Otherwise, the session can be terminated.
Subsequently, the client can “challenge” the server using asymmetrical encryption technology in order to verify that the server indeed possesses the private key associated with the public key contained in the server certificate. In challenging the server, the client can generate a random string of data and can encrypt the random string of data using the server's public key contained in the server certificate. The client can transmit the encrypted data to the server and can request that the server deliver the data to the client. In order to deliver the data to the client, however, the server first must decrypt the data using the server's private key which corresponds to the server's public key contained in the server certificate. Optionally, the server, too, can challenge the client using a similar exchange of encrypted data.
Once the client and server have been mutually authenticated, the client and the server can agree upon a shared secret for use in future symmetrical encryption and decryption operations. Typically, the client can select the secret and encrypt the selected secret using the server's public key. The client can transmit the asymmetrically encrypted secret to the server so that only the client and the server share the common secret. When both the client and the server have agreed upon the shared secret, symmetrical data transfer can begin between the client and the server using the shared secret as the key to the symmetrical encryption and corresponding decryption operations. Notably, a more thorough treatment of the SSL protocol has been published by Netscape Communications Corporation of Mountain View, Calif. in Freier, Karlton, Kocher, The SSL Protocol Version 3.0 (Netscape Communications Corp. March 1996), incorporated herein by reference. Additionally, an SSL 3.0 compatible standard has been approved by the Internet Engineering Task Force (“IETF”) and has been published by the IETF as Dierks & Allen, RFC2246—The TLS Protocol Version 1.0 (IETF January 1999), incorporated herein by reference.
Unlike visual Web browsers executing the HTTPS protocol, SSL has not been integrated with Voice Browsers. Generally, a Voice Browser, unlike a visual Web browser, does not permit a user to interact with Web-based content visually. Rather, a Voice Browser, which can operate in conjunction with a Speech Recognition Engine and Speech Synthesis Engine, can permit the user to interact with Web-based content audibly. That is, the user can provide voice commands to navigate from Web-based document to document. Likewise, Web-based content can be presented to the user audibly, typically in the form of speech synthesized text. Thus, Voice Browsers can provide voice access and interactive voice response to Web-based content and applications, for instance by telephone, personal digital assistant, or desktop computer.
Significantly, Voice Browsers can be configured to interact with Web-based content encoded in VoiceXML. VoiceXML is a markup language for distributed voice applications based on extended markup language (“XML”), much as HTML is a markup language for distributed visual applications. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and Dual Tone Multifrequency (“DTMF”) key input, recording of spoken input, telephony, and mixed-initiative conversations. Version 1.0 of the VoiceXML specification has been published by the VoiceXML Forum in the document Linda Boyer, Peter Danielsen, Jim Ferrans, Gerald Karam, David Ladd, Bruce Lucas and Kenneth Rehor, Voice extensible Markup Language (VoiceXML™) version 1.0, (W3C May 2000), incorporated herein by reference. Additionally, Version 1.0 of the VoiceXML specification has been submitted to the World Wide Web Consortium by the VoiceXML Forum as a proposed industry standard.
Version 1.0 of the VoiceXML specification provides a high-level programming interface to speech and telephony resources for application developers, service providers and equipment manufacturers. As noted in W3C submission, standardization of VoiceXML will simplify creation and delivery of Web-based, personalized interactive voice-response services; enable phone and voice access to integrated call center databases, information and services on Web sites, and company intranets; and help enable new voice-capable devices and appliances. Still, the VoiceXML specification lacks a mechanism for secure communications through encrypted network transmissions via the SSL protocol over the TCP/IP protocol. Accordingly, what is needed is a Voice Browser incorporating SSL support for performing secure communications in a data communications network.