1. Field of the Invention
The present invention relates to voice over IP applications and more specifically to a system and method of modifying the speed of playout for received speech to compensate for delay jitter.
2. Introduction
The present invention relates to the Voice over Internet protocol (VoIP). Just like the name suggests, VoIP uses the Internet Protocol (IP) to send/receive voice as data packets over an IP network as is shown by the arrangement 100 in FIG. 1. By using a VoIP protocol, voice communications can be achieved on any IP network 104 regardless of the fact that it is Internet, Intranets, Local Area Networks (LAN), etc. In a VoIP enabled network, the digitized voice signal is encapsulated in IP packets by a compute device 102 and then sent over the IP network 104. A VoIP signaling protocol is used to set up and tear down calls, carry information required to locate users and negotiate capabilities (such as bandwidth). At the receiving end, a compute device 106 receives the packets, performs processing such as stripping the voice information from the signaling information, decoding and presenting via a speaker the transmitted speech. A known advantage of VoIP is the relatively low cost of the phone call. Other factors are also important, such as the integration of voice, data and video on one network as well as new services available on the converged network and simplified management of end user terminals.
Several VoIP protocol stacks have derived from various standard bodies and vendors, namely H.323, SIP, MEGACO and MGCP. These standards are known to those of skill in the art and information is readily available. The present invention is independent of any specific protocol associated with VoIP.
VoIP has one benefit of the coming convergence between data and voice telecommunications networks. It allows its users to send voice transmissions over the Internet. However, the Internet's design can cause problems that can slow the growth of VoIP. Since the Internet is an environment created to carry data, it was not originally intended to transmit lag-sensitive voice signals.
As is common with the Internet, the individual packets associated with the transmitted data may arrive at the end point at different times. Furthermore, some packets may arrive at the end point out of order. In a live conversation between two people using VoIP, these problems with the manner in which packets are transmitted through the Internet can cause a delay in speech, jitter in the received speech information or other problems that can reduce the clarity and naturalness of the conversation.
This problem with VoIP technology can be characterized by a transmission variable called delay jitter. The existence of delay jitter is incompatible with the requirements of standard speech decoders which function in a time constant manner. The current solution to this problem is to implement a jitter buffer that smooths out the delay variations associated with received packets. For example, a built-in delay of 1/10th of a second at the end point of the communication can enable a buffering of packets for a period of time to allow delayed packets and packets delivered out of order to be assembled appropriately and delivered at a constant time to a speech decoder.
While the buffet strategy works it comes at the expense of adding delay which is inherent in the use of the delay buffer. This increased connection delay exacerbates echo related problems and where excessive delays can break down the natural cadence of conversations. Furthermore, in many cases, conversations occur between people who live far apart across the world and were conversations may also be transmitted to least in part through a satellite link. The delay introduced by distance plus the delay introduced by a delay jitter buffer causes a performance penalty that can prevent further acceptance of the voice technology.
One attempt to reduce the delay caused by the delay jitter buffer is to provide a dynamically modifiable buffer. In this attempt to solve the problem, the buffers are allowed to shrink and to grow themselves based on received data associated with how quickly or how slowly or how out of order received packets are. Tailoring the buffers dynamically according to the current flow of packets can reduce some of the delay involved in the process but also necessarily requires the step of determining package delivery speed which in and of itself introduces further delay, and the dynamics involved in growing and shrinking in size can itself introduce voice quality problems.
Therefore, even with the attempt of modifying the buffer size to accommodate and to improve the delay when using the delay buffers, these buffers whether modifiable or not still often introduce delay that is unacceptable for pleasant voice conversations. What is needed in the art is a system and method of reducing the delay in VoIP applications.