1. Field of the Invention
The present invention relates to voice over the internet (VoIP) telephony latency, and more particularly to a management technique method for reducing latency in VoIP telephony.
2. Description of the Related Art
The science of translating sound into electrical signals, transmitting them, and then converting them back to sound is called Telephony (i.e. the science of phones).
The term is used frequently to refer to computer hardware and software that performs functions traditionally performed by telephone equipment.
Internet telephony generally refers to communications services voice, facsimile, and/or voice-messaging applications that are transported via the Internet, rather than the public switched telephone network (PSTN). The basic steps involved in originating an Internet telephone call are conversion of the analog voice signal to digital format and compression/translation of the signal into Internet protocol (IP) packets for transmission over the Internet; the process is reversed at the receiving end as shown in prior art FIG. 1.
Real Time Transport Protocol (RTP) has gained widespread acceptance as the transport protocol for voice and video on the Internet. It provides services such as timestamping, sequence numbering, and payload identification. It also contains a control component, the Real Time Control Protocol (RTCP), which is used for loose session control, QoS reporting and media synchronization, among other functions.
RTP itself does not guarantee real-time delivery of data, but it does provide mechanisms for the sending and receiving applications to support streaming data. Typically, RTP runs on top of a User Datagram Protocol (UDP), although the specification is general enough to support other transport protocols. The RTP labels all information transferred by a sender with a timestamp. By examining the timestamps the receiver is able to sort the packets in the original order and synchronize real time streams and/or compensate jitter in audio or video data.
The RTCP was devised to give applications a status on the quality of a network. With this information parameters affecting the transmission of data, e.g. a jitter buffer size, can be optimized. The RTP header adds 16 bytes to the total overhead, and is prefixed by the UDP's additional 8 bytes of header information. The IP header, which is 20 bytes in size, is prefixed to form a datagram, thus, to transmit 20 bytes of audio or video as a 64-byte datagram is required. A datagram is defined to be a data block, segment, chunk, data packet or packet of audio, video or audio/video.
Within those unneeded bits of the RTP header is 32 bits of timestamp. This timestamp is in particular not needed because a sound packet is sequenced, and since it also translates to a specific number of sound samples it is possible to calculate precisely what the interarrival time should have been (and would have been without jitter). The timestamp is allowed to correlate directly with sample counts, in some implementations, and if it is used that way this value may be entirely redundant as it is directly calculable from the sequence number of the packet if the packets carry a fixed payload.
Latency, the delay in shipping a datagram from sender to receiver, affects the pace of the conversation. Humans can tolerate about 250 milliseconds (ms) of latency before it has a noticeable effect.
To support voice in its native analog form over a digital network, the analog signal has to be coded (i.e., converted) into a digital format at some point after being generated to enter the WAN, LAN, Internet or communication network. On the receiving end, the digital signal has to be decoded (i.e., reconverted) back into an analog format in order to be intelligible to the human ear, thus timing is critical. The network must be in a position to accept, switch, transport, and deliver every voice byte precisely every 125 ms. That means that latency (i.e., delay) must be minimal and jitter (i.e., variability in delay) must be virtually zero.
Real-time voice conversations are delay sensitive. Once the one-way delay exceeds a quarter of a second—250 milliseconds (ms)—it becomes relatively difficult for the parties in a conversation to tell when one person is finished speaking. This increases the probability that the parties will talk at the same time.
A voice call is routed from the PBX at its origination—via the gateway, LAN, and router at that location—through the IP network to a telephone connected to the PBX at its destination. There are several areas where datagrams transporting voice could be delayed. As an analog voice conversation is routed through the PBX to the voice gateway, the voice-coding algorithm used by the gateway adds a degree of latency. The actual amount of delay is based on the type of voice coder used. Once a small sample of voice is coded, it must be encapsulated within a datagram for transmission to a distant gateway. The encapsulation process includes adding applicable UDP and headers to form the datagram as well as the flow of the datagram from the gateway to the router via the LAN.
The total delay from those activities represents an interprocess time at the origin and an interprocess delay at the destination.
Once the datagram reaches the IP network, it will be routed through one or more routers to a network egress point. This routing also adds variable delay. The causes for the variable delay include the number of routers in the path from the point of entry to the point of exit, the processing power of each router, and the traffic load offered to each router. These delays occur as the voice-transporting datagram flows through the local network and contributes to the delay encountered by the datagram as it flows through the wide-area IP network.
However, the RTCP (Real Time Control Protocol) which sends few packets comparatively in a stream (most are RTP packets), provides for periodic correlation between the timestamps of the RTP and a real-time time clock stamp. This allows calculation of actual latencies and the like, provided that the real-time clocks on the sender and receiver are synchronized. This synchronization of real-time clocks is contemplated in the RTP specification via a separate protocol called Network Time Protocol (NTP). Alternately, one-way latency is determined as half the round-trip time of some information. Nevertheless, neither of these establish the one-way latency effectively since latency can be extremely asymmetrical between two network nodes.
Since none of the RTP information is needed to effectively reduce latency in the system, in order to most efficiently optimize the latency it is unnecessary to actually know what the latency is. The length of the path to an endpoint either in milliseconds or in hops or miles is irrelevant for the purpose of minimizing the latency. What is necessary is to minimize the length of the jitter buffer on the receiving end based upon statistics obtained entirely by observing the length of the jitter buffer. The length of the jitter buffer is observed at the time that each data block of sound is ready and prepared for insertion into the jitter buffer, which takes into account all contributors to the latency including the decompression and formatting of the sound on the receiving end.
What is needed is a method for reducing latency that is not dependent on the Real Time Protocol.
What is also needed is a method for reducing overhead cost of transmitting data packets.
What is further needed is a method to correct for bias in consuming devices of the data blocks.
What is additionally needed is a method to eliminate effects of time clock inaccuracies and differences in minimizing the actual latency of the system.