Advances within Internet technologies have spawned new mechanisms of data, voice, and video communication including Internet Protocol (IP) telephony, which is a quickly developing field of telecommunications. However, the Internet is faced with two significant obstacles to fast, yet secure, communications. The first obstacle is usable bandwidth. Bandwidth affects the rate at which data can be transferred. The second obstacle pertains to security. The Internet is not a direct point-to-point connection between computers. Rather, it is a network to which computers (or other devices) can connect for the purpose of communicating with one another. As such, there is increased opportunity for eavesdropping on data, voice, or video transmissions over the Internet. One method of enhancing the security of Internet based communications is to encrypt the data being transmitted before sending it out over the network and de-encrypting the data once it is received by the far end device. Voice security is desirable for VoIP connections over an IP network.
The present invention addresses security issues with respect to VoIP telephone calls. Currently, a call signalling channel is secured by using either a Transport Layer Security (TLS), a Secure Sockets Layer (SSL), or an IP Security Protocol (IPSec) on a secure well-known port. These approaches, however, suffer from delays in call setup time, complex handshaking procedures, and significant protocol overhead. Moreover, some VoIP implementations do not prevent signalling information from being viewed by unscrupulous computer hackers on the IP network used for VoIP calls. In some instances, when a SETUP message is sent over the IP network, the calling name and calling number is visible to sniffers or other such tools used on the Internet. To overcome this, voice packets are encrypted at a source and decrypted at the destination in order that a third party cannot eavesdrop on the conversation.
In order to properly advise both endpoints as to how to encrypt the voice packet, media signalling must carry the appropriate security information for negotiation requirements. This signalling must also be passed over a secure channel in order that third parties are not aware of what encryption procedures are being negotiated. Unfortunately, the delay of the signalling path relative to established voice path can result in some undesirable side effects. In FIG. 1, a typical VoIP system including an Internet Protocol Network 100 is shown with a signalling path 15 shown relative to an established voice path 14 between two IP telephony devices 10, 13. A switch 11 is represented in the signalling path 15. Clearly, the shorter path exists in-band. The main concerns in such a VoIP system include noise and voice clipping. Noise occurs when the receiver expects to decipher a real time transport protocol (RTP) packet based on a “best guess”, but receives the packets based on a different cipher, or no cipher before the signalling is sent to the receiver. Voice clipping occurs because the receiver may not play any RTP packets until final negotiation, in which case initial packets would be missed. Typically, the receiver must wait for the final confirmation of the negotiated capabilities of the endpoints before accepting the voice stream packets. On the other hand, if the receiver does not wait for the confirmation, loud “noise” may be played out when the capabilities of the transmitter and receiver do not match.
What is needed is a method that increases security, simplifies VoIP handshaking procedures, and reduces call setup time without adding significant protocol overhead. Further, what is needed is a method that addresses both noise and voice clipping concerns.