1. Field of the Invention
The present invention relates to a packet receiving method and device, and in particular to a packet receiving method and device which convert a voice packet received into a voice.
Together with a recent rapid spread of the Internet, a VoIP communication for transmitting IP-packetized voice data over an IP network has been receiving attention for its inexpensive communication cost. The IP network is of a best-effort type, and the bandwidth of a transmission line between a transmitting device and a receiving device is not guaranteed. The resultant communication sound quality deteriorates due to a transmission delay fluctuation (hereinafter, occasionally referred to as jitter) or the like caused by congestions or the like. Also, due to operations of the transmitting device and the receiving device with mutually independent clocks, a clock shift therebetween makes the communication sound quality deteriorate. Technologies for preventing such a deterioration of the communication sound quality are becoming more and more important.
2. Description of the Related Art
FIG. 15 shows an absorbing principle of a transmission delay fluctuation of a voice packet 50 in a general voice packet receiving device 100. The packet receiving device 100 is provided with a receiving packet buffer 10 and a voice reproducer 40. The packet receiving device 100 receives the voice packet 50 arriving irregularly due to a jitter (at step S10) to be temporarily accumulated (at step S11). The receiving packet buffer 10 regularly transfers the voice packet 50 temporarily accumulated to the voice reproducer 40 (at step S12). Thus, the transmission (transfer) delay fluctuation of the voice packet 50 on its receiving side is absorbed, so that a stable sound quality without a sound interruption and a sound skip can be maintained.
FIGS. 16A and 16B show a variation of the “number of packets temporarily accumulated (hereinafter, occasionally referred to as the number of accumulated packets or buffering amount)” by the receiving packet buffer 10 resulting from the jitter. FIG. 16A specifically shows the variation of the buffering amount in the absence of clock shifts. The receiving packet buffer 10 is controlled to temporarily accumulate the voice packet 50 so that the number of accumulated packets may become e.g. the half (hereinafter, occasionally referred to as initial value (reference value)) of the maximum accumulable capacity (buffer size). The receiving packet buffer 10 thus controlled can absorb the jitter in the positive direction or negative direction equal to or less than the initial value, in the former half (left half) of FIG. 16A. In the latter half (right half), it can not absorb all of the jitters in the positive direction or negative direction equal to or more than the initial value. Therefore, a buffer overflow c4 or a buffer underflow c5 respectively occurs, so that the voice packet received 50 is discarded or an interrupted transmission state of the voice packet 50 occurs.
In order to eliminate the discard or interrupted state, the maximum capacity and the initial value of the receiving packet buffer 10 may be set simply large. However, the transmission delay of the received packet is increased by the number of accumulated packets temporarily accumulated in the receiving packet buffer 10. This increase in the transmission delay interferes with conversations in a interactive communication. For example, unnatural conversations resulting from a large transmission delay in a satellite relay or the like can be mentioned.
Thus, if the number of accumulated packets (buffering amount) of the receiving packet buffer 10 is too small, all of the jitter can not be absorbed and a sound quality deterioration such as a sound interruption is caused, while when it is too large, the transmission delay is caused, that is a trade-off relationship. Accordingly, it is necessary that the number of accumulated packets of the receiving packet buffer 10 is optimized to a requisite minimum value which can secure the sound quality according to the jitter resulting from a network used, and that the delay resulting from the temporary accumulation in the receiving packet buffer 10 is made as small as possible.
FIG. 16B shows a case where the number of accumulated packets (buffering amount) of the receiving packet buffer 10 is adjusted (controlled). Namely, in the receiving packet buffer 10, the buffering amount is controlled corresponding to the jitter amount, which is different from the receiving packet buffer 10 of FIG. 16A. FIG. 16B is the same as FIG. 16A up to a point t1 of 16B. However, after the point t1 when the jitter becomes abruptly large, the maximum value (buffer size) and the initial value (reference value) of the receiving packet buffer 10 are adjusted or controlled to the capacities which can accumulate the maximum value of the jitter in the positive/negative direction (see F2 and E2). Thus, the buffer overflow c4 and the buffer underflow c5 shown in FIG. 16A do not occur in the receiving packet buffer 10, so that the sound skip and the sound interruption resulting from the jitter can be prevented.
FIG. 16B shows a case where the maximum value of the jitter abruptly varies at the point t1. However, the maximum value of the jitter differs even on the same network according to time zones at which usages concentrate or usages are fewer such as at midnight. Furthermore, the maximum value momentarily varies even within the same time zone. Accordingly, in order to maintain the number of accumulated packets (buffering amount) at on optimum value, a real-time adjustment according to a network state is required.
In the VoIP communication, there is a problem of a “clock shift” between the transmitting and receiving devices in addition to the above-mentioned jitter. Namely, when there is a shift (deviation) between a clock of a recorder on a transmitting device side and a clock of the voice reproducer 40 on a receiving device side, an excess or a deficiency of the steady number of accumulated packets occurs in the receiving packet buffer 10.
FIG. 17 shows an ideal state of the receiving packet buffer 10, as well as the number of accumulated packets (buffering amount) of the receiving packet buffer 10 when no clock shift or jitter occurs. The number of accumulated packets always maintains the initial value.
FIGS. 18A and 18B show an example of the variation of the number of accumulated packets (buffering amount) due to the clock shift. FIG. 18A specifically shows the number of accumulated packets of the receiving packet buffer 10 when a reproduced clock of the voice reproducer 40 is faster than the clock of the recorder of a voice packet transmitting device. As for the number of accumulated packets, since the voice packet 50 is not received in time for the reproduction rate, a state of a buffer underflow c6 occurs in which the voice packet 50 has not arrived at the packet transmission time. Conversely, when the reproduced clock is late, the reproduction rate is not in time for the packet reception and an overflow (not shown) occurs in the receiving packet buffer 10.
As a result, a loss or discard of the voice packet 50 occurs, the sound interruption and the sound skip occur, and the sound quality is significantly reduced. Accordingly, in order to prevent the occurrence of such a steady underflow or overflow in the receiving buffer, the number of accumulated packets of the receiving packet buffer 10 is required to be adjusted according to the clock shift.
FIG. 18B shows an example of the buffering amount (the number of accumulated packets) adjustment (control) of the receiving packet buffer 10 accommodating to the buffer underflow resulting from the clock shift. At the points t1, t2, . . . , the buffering amount is adjusted as shown by E3 and the occurrence of the buffer underflow c6 shown in FIG. 18A is avoided.
Various technologies for resolving the above-mentioned jitter and clock shift have been proposed. One example for (1) a technology accommodating to the jitter and (2) a technology accommodating to the clock shift will now be described.
(1) Technology Accommodating to Jitter (Transmission Delay Fluctuation)
FIG. 19 shows a prior art example (1) of a packet receiving device. A packet receiving device 100a accommodates to a jitter, and is provided with a receiving packet buffer 20, a buffer controller 21, and a jitter measurer 22. The receiving packet buffer 20 temporarily accumulates the voice packet 50 received, and the jitter measurer 22 calculates a jitter value from reception time information of the voice packet 50. The jitter value is compared with the number of accumulated packets (amount) of the receiving packet buffer 20, and a receiving buffer adjustment value 64a for increasing/decreasing the buffer accumulation amount as required is provided to the buffer controller 21. The buffer controller 21 provides a packet output request 52a to the receiving packet buffer 20, extracts the voice packet 50 temporarily accumulated in the buffer 20 to be provided to the voice output portion 40, thereby adjusting the number of accumulated packets (buffering amount) of the receiving packet buffer 20. Namely, the packet receiving device 100a monitors the jitter value, adaptively controls the buffer accumulation amount, and absorbs the delay fluctuation (jitter).
An example of a delay fluctuation absorbing device (packet receiving device) using a similar transmission delay fluctuation absorbing method can be mentioned in which a buffer temporarily accumulates a voice packet transmitted from a packet communication network, a delay fluctuation calculation means measures a delay fluctuation amount of the voice packet having arrived, a delay amount control means compares the measured delay fluctuation amount with a set delay setting value, instructs to increase a delay amount when the measured delay fluctuation amount exceeds the delay setting value by a predetermined value or more, and instructs to decrease the delay amount when the measured delay fluctuation amount falls short of the delay setting value by a predetermined value or more, a delay amount adjustment means repeatedly transmits a soundless voice packet upon reception of the instruction of increasing the delay amount, and discards the soundless voice packet upon reception of the instructions of decreasing the delay amount for adjusting (see e.g. patent document 1).
(2) Technology Accommodating to Clock Shift
FIG. 20 shows a prior art example (2) of a packet receiving device, and is provided with a receiving packet buffer 30, a buffer controller 31, and a number of accumulated packets monitor 32. The receiving packet buffer 30 temporarily accumulates the voice packet 50 received. The number of accumulated packets monitor 32 monitors the accumulation amount of the receiving packet buffer 30 as the number of accumulated packets (the number of packets accumulated by the receiving packet buffer 30), provides to the buffer controller 31 a receiving buffer adjustment value 64b indicating instructions of decreasing the number of accumulated packets by discarding the packet when the number of packets becomes equal to or more than a threshold value, and provides to the buffer controller 31 the receiving buffer adjustment value 64b indicating instructions of increasing the number of accumulated packets by repeatedly reproducing the packet (or inserting an interpolation packet) when the number of accumulated packets becomes equal to or less than a threshold value.
The buffer controller 31 provides a packet output request 52b to the receiving packet buffer 30 based on the instructions of the receiving buffer adjustment value 64b, replicates or discards the voice packet 50 accumulated in the receiving packet buffer 30, and suppresses the occurrence of the underflow and the overflow of the buffer 30.
An example of a packet receiving device of a similar method accommodating to a clock shift can be mentioned in which a buffer accumulates a voice signal, a voice detector detects voiced/voiceless information indicating a voiced/voiceless section of the voice signal, a buffer monitor motors an accumulation amount of the voice signal accumulated in the buffer. The buffer controller inserts a new voice signal into the voice signal accumulated in the buffer or discards the voice signal accumulated based on the accumulation amount and the voiced/voiceless information (see e.g. the patent document 2).
[Patent document 1] Japanese Patent Application Laid-open No. 2001-160826 (page 2, FIG. 1)
[Patent document 2] Japanese Patent Application Laid-open No. 2003-46490 (page 2, FIG. 1)
FIG. 21 shows a variation of the number of accumulated packets of the receiving packet buffer resulting from a jitter and a clock shift. In an actual system environment, it is general that the jitter and the clock shift occur at the same time, and a steady buffer variation (see D1 and D2 of FIG. 21) due to the clock shift and a momentary buffer variation (see point t1) due to the jitter are combined, so that the buffering amount varies. Namely, the buffering amount (average) is gradually reduced due to the clock shift, and the buffering amount momentarily varies due to the jitter (see point t1), whereby a buffer underflow c7 frequently occurs.
Hereinafter, problems of the packet receiving device 100a and the packet receiving device 10b respectively shown in the prior art examples (1) and (2) of FIGS. 19 and 20 will be described.
FIG. 22 shows the number of accumulated packets in the packet receiving device 100a of the prior art example (1). While FIG. 16B shows the number of accumulated packets in case where only the jitter (transmission delay fluctuation) occurs, in FIG. 22 the clock shift further occurs and the number of accumulated packets is gradually reduced (see D3 and D4). Although the packet receiving device 100a adjusts the initial value (reference value) at the point t1 when the jitter varies to accommodate to the jitter variation (see E4), it does not accommodate to the clock shift. Therefore, buffer underflows c8 and c9 occur. If the packets in which the underflows have occurred are in the voice section, the sound quality significantly deteriorates.
Namely, by the method of calculating the jitter from the reception time information of the received packet, a clock shift having a steady fixed shift component can not be detected. Therefore, the adjustment amount of the jitter becomes inaccurate, and an excess and a deficiency of the jitter adjustment occur.
FIG. 23 shows the number of accumulated packets in the packet receiving device 100b of the prior art example (2). While FIG. 18B shows the number of accumulated packets in case where only the clock shift occurs, FIG. 23 shows the number of accumulated packets in case where the jitter further occurs. Although the packet receiving device 100b adjusts the buffering amount at points t1-t4, and t6-t9 to accommodate to the clock shift, it does not accommodate to the jitter variation. Therefore, a buffer overflow c10 and a buffer underflow c11 occur after the point t5 when the jitter largely varies.
Namely, the number of accumulated packets (buffering amount) of the receiving packet buffer 10 varies including “momentary variation due to jitter” and “steady variation due to clock shift”. Therefore, when the number of accumulated packets (momentary value) of the receiving packet buffer 10 is used as the control parameter as shown in the prior art example (2), it is not possible to distinguish whether the buffer variation at the point exceeding a control threshold value is resulting from the jitter or the clock shift. Steady buffer adjustment processing for maintaining the buffering amount fixed is required for the clock shift causing the steady buffer variation, while buffering amount adjustment processing for increasing/decreasing the buffering amount is required for the jitter causing the momentary buffer variation, and both buffer adjustment processings are different from each other.
Accordingly, when the variation of the number of accumulated packets by both processings is applied in a unified way, a stable buffer control can not be performed, so that there is a possibility of reversely causing the sound quality deterioration by the excessive buffer control. For example, when the buffering amount is momentarily reduced below a certain threshold value by a negative (delay) jitter, the processing of increasing the initial value (reference value) of the buffering amount is performed in the prior art example (2). However, hereafter, when the voice packets delayed by the jitter sequentially arrive all at once, the voice packets having arrived will be further accumulated to the buffering amount increased by the buffer control, whereby there is a possibility that the overflow c10 of the buffer is induced.