As the Internet expands rapidly, the service of voice over IP (VoIP) is widely adopted. However, the network traffic conditions remain the most important factor for the voice quality of VoIP regardless of the compression techniques used. When the network latency varies, the packet containing the compressed voice data is delayed or even lost to reach the receiver end. For the VoIP application, the voice packet loss or out-of-order arrival will greatly affect the voice quality.
In the VoIP system, the arrival time of the voice packets will be jittered due to the network delay variation. The current use of jitter buffer is the most widely employed technique for solving this problem. By storing the received voice packets in the jitter buffer to delay the playout, the network impact will be reduced on the playout voice quality.
In the jitter buffer management mechanism, the delay length of the voice packets plays the key role in the voice quality. The current delayed playout designs are divided into two categories. The first is to use a fixed length (constant) delay in playout, and the second is to use an adjustable playout delay. FIG. 1 shows a schematic view of fixed playout delay. The small dots in the figure indicate the voice packets arriving at the receiving end. The x-axis is the arrival time in milliseconds (ms), and y-axis is the voice packet delay, that is, the transmission time of the voice packet in the network. The two horizontal lines in FIG. 1 are the 200 ms and 90 ms fixed playout delay, respectively.
As shown in FIG. 1, the drawback of the fixed playout delay is that when the fixed playout delay is too small, such as 90 ms, some voice packets will arrive too late to be played back. This can be solved by a longer fixed playout delay. However, a longer fixed playout delay, such as 200 ms, will cause the degradation of the voice communication quality.
The advantage of the fixed playout delay is the low computation complexity in the implementation, while the drawback is that it does not reflect the actual network conditions. Once the network is congested and the jitter buffer is overflow, the communication will be cut off.
To solve the aforementioned drawback, related researches were conducted to develop adjustable playout delay techniques so that the delay can be adjustable in accordance with the network conditions by adjusting the jitter buffer size. A plurality of techniques are disclosed in related patents, including U.S. Pat. No. 6,360,271, U.S. Pat. No. 6,600,759, U.S. Pat. No. 6,693,921, U.S. Pat. No. 6,452,950, U.S. Pat. No. 6,700,895, U.S. Pat. No. 6,684,273, U.S. Pat. No. 6,683,889 and U.S. Pat. No. 6,747,999.
U.S. Pat. No. 6,360,271 disclosed a “system for dynamic jitter buffer management based on synchronized clocks” to use a global positioning system (GPS) to synchronize the clock. By arranging the playout delay for each voice packet, the patent provides a dynamic jitter buffer management mechanism.
U.S. Pat. No. 6,600,759 disclosed an apparatus using a hardware element for estimating jitter in the voice packets over a network. The network follows the TCP/IP protocol.
U.S. Pat. No. 6,700,895 disclosed a method for determining the optimal jitter buffer size based on the data packet loss in a real-time communication system.
U.S. Pat. No. 6,683,889 disclosed a method for automatically adjusting the jitter buffer size. The method determines the jitter buffer size by comparing the packet delay and a default value.
However, the estimation of the network delay remains difficult. The conventional techniques use the time stamp on the voice packet to compute the network delay, which may also be affected by the clock rate discrepancy between the transmitting and receiving ends. Therefore, the sampling rate and the communication may not be synchronized. The sampling rate discrepancy may be a result of the hardware at the transmission and receiving ends. For example, the voice sampling is configured to be 8 KHz. The software is based on 8 KHz to encode and decode the voice signals. However, if the hardware devices at both ends are not exactly setting at 8 KHz, the error will occur.
The aforementioned techniques fail to effectively solve the problem of estimating the voice packet playout delay. Some techniques require extra hardware element for implementation, while others do not support silence adjustment to adjust the playout time. However, the voice packet playout delay is the key to the quality.