1. Field of Invention
This invention relates to a delay jitter reducing device for sequentially receiving a series of chronological data segments through a transmission path such as the Internet and delaying an individual data segment for an appropriate amount of time, thereby reducing delay jitter that has occurred in the propagation process of an individual data segment and obtaining chronological data segments from which effects of the delay jitter have been eliminated; and a delay jitter reducing method thereof.
2. Description of the Related Art
One form of data transmission is a real-time transmission that transmits a chronological sample of continuous signals such as, for example, voice signals after loading them to a plurality of consecutive packets. In such a real-time transmission, if delay time in transmitting a packet for individual packets are equal to one another, it is possible to obtain a voice signal of the same waveform as the source node by reproducing a chronological sample in a packet at the time of receiving each packet.
In a network such as the Internet, however, even in a case where a plurality of packet are transmitted from an unchanged source node to an unchanged destination node, the propagation delay time for individual packets are not necessarily the same as one another, and the propagation delay time varies among packets. This variation of the propagation delay time among packets is generally called delay jitter.
In a case where such delay jitter occurs, when a chronological sample is reproduced from received packets at the point of receiving each packet at the destination node, it is not assured that a signal of the same waveform as the original transmission signal can be reproduced from the received packets.
In such a case, destination nodes usually take a step of reducing delay jitter using buffers so as to obtain chronological data with effects of delay jitter eliminated.
This technique for reducing delay jitter will be described in detail with reference to FIG. 12 to FIG. 17.
FIG. 12 is a block diagram showing a configuration example of a real-time voice transmission system. In the system, at a source terminal 10, a voice signal to be transmitted is encoded by a voice encoder 11, and chronological voice packets on which coded data of the voice signal are loaded are generated. A transmission unit 12 transmits these individual voice packets to a destination terminal 30. Each voice packet arrives at the destination terminal 30 after passing a network 20. At the destination terminal 30, voice packets from the source terminal 10 are received by a receiving unit 31 and reserved in a buffer 32. Subsequently, voice packets reserved in the buffer 32 are read from the buffer 32 in the same order as an order generated at the source node and transmitted to a voice decoder 33. The voice decoder 33 receives voice packets transmitted in this way and decodes the voice signal from coded data included in the voice packets.
In the real-time voice transmission system, each voice packet generated in the source terminal 10 is sent out to the network 20 at the same transmission time interval as the generated time interval of each packet. However, as described already, propagation delay time required for these individual packets to reach the receiving terminal 30 is not fixed for each voice packet. Such being the case, the destination terminal 30 adjusts the timing for sending individual voice packets to the voice decoder 33. FIG. 17 shows an example of this timing adjustment. In the example shown in FIG. 17, voice packets P0, P1, and P2 arrive at the destination terminal 30, having taken a propagation delay time of d0, d1, and d2 each. As shown, if each voice packet P0, P1, and P2 can be delayed for D0, D1, and D2 which is an appropriate amount of time for each, a total delay time T in turn can be fixed, where the total delay time is the amount of time required for each voice packet transmitted from the source terminal 10 to the voice decoder 33. The buffer 32 as shown in FIG. 12 is a device used for adjusting delays in order to fix the total delay time of each voice packet in this way. Assuming a minimum delay time of a voice packet as dmin and maximum delay time of a voice packet as dmax in the network 20, the difference between them, D=dmax−dmin, is referred as delay jitter width as a matter of convenience. The buffer 32 in FIG. 12 is required to adjust a variation of delay time in the range of this delay jitter width; in other words, the buffer 32 should be capable of reducing the delay jitter.
Hereinafter described will be on delay adjustment of a voice packet by the buffer 32 with reference to FIGS. 13A and 13B.
In FIG. 13B, there are provided four queues placed above and below in parallel, each queue consisting of a chain of nine boxes in a row. The first queue indicates a state of the buffer 32 at a certain time t1. The second queue indicates a state of the buffer 32 at time t2 that is 1 s later than time t1. Likewise, the third and fourth queue each indicates a state of the buffer 32 at time t3 that is 1 s later than time t2 and at time t4 that is 1 s later than time t3.
In the example shown in FIG. 13B, the buffer 32 has a capacity of storing nine voice packets. Each of the nine boxes in each queue is an area for storing a voice packet, and the notation, #1 to #9, in each box indicates the address of each area.
In the destination terminal 30, one voice packet is read every 1 s from the buffer 32 and sent to the voice decoder 33, where “s” is a unit such as several milliseconds and several dozen milliseconds depending on a data attribute, the unit being suitable for each data attribute. The address of an area where a voice packet is read is also updated one address every fixed time 1 s. In FIG. 13B, an area where a voice packet is currently being read is shown at the right end of each queue, an area on the left next thereto is where the readout is performed 1 s later, and an area on the second left next thereto is where the readout is performed 2 s later. Likewise, the other areas follow; thus, the area at the leftmost of the queue is an area where a voice packet is read 8 s later.
In the example shown in FIG. 13B, a voice packet is read from the area of address #1 at time t1. At time t2, another voice packet is read from the area of address #2, another voice packet is read from the area of address #3 at time t3, and another packet from the area of address #4 at time t4. Therefore, if a voice packet received at time t1 is written into the area of address #4, the voice packet is output from the buffer 32 to the voice decoder 33 at time t4 which is 3 s later. Also, if a voice packet received at time t1 is written into the area of address #9, the voice packet is output from the buffer 32 to the voice decoder 33 8 s later. In this way, controlling a write address into which a received voice packet is written enables delaying the voice packet for an arbitrary amount in the range of 0 s to 8 s.
Therefore, if it is possible to delay a voice packet for an amount of time followed by subtracting an absolute amount of delay time from maximum delay time to be reduced (dmax shown in FIG. 17) provided that we can obtain an absolute amount of delay time since each voice packet was transmitted by the source terminal 10 till it reaches the destination terminal 30, it would be possible to minimize as well as to fix the total delay time for each voice packet transmitted from the source terminal 30 to the voice decoder 33.
However, the destination terminal 30 is not capable of finding how much propagation delay time it has taken for each voice packet to reach the destination. As a consequence, a conventional delay control for each packet is performed in the following method. For simplicity, we assume here that a series of voice packets transmitted from the source terminal 10 at a certain time interval reaches the destination terminal 30 in the same order as the transmission order.
First of all, the destination terminal 30, upon receiving a first voice packet through the network 20, writes the voice packet into an initial input location of the buffer 32 (S1, S2 of FIG. 13A). In the example shown in FIG. 13B, the initial input location is an area corresponding to an address whose assigned number is one larger than an area where a voice packet is read at the point of receiving the first voice packet.
Then, a voice packet on and after the second packet is written in an area where the readout is performed at the earliest timing among areas that are vacant at the point of receiving the subject voice packet (S3 of FIG. 13A).
In the example shown in FIG. 13B, the first voice packet P1 received at time t1 is written in the area of address #2, which is the initial input location. Then, no voice packet is received at time t2, and the voice packet P1 is read from the area of address #2 and sent to the voice decoder 33. When it turns time t3, a second voice packet P2 is received. It appears to have taken delay time that is 1 s longer than the voice packet P1 for the voice packet P2 to be transmitted. Then, the voice packet P2 is written in an area where the readout is performed at the earliest timing among vacant areas at the receiving time t3, that is, the area of address #3. Subsequently, at time t3, the voice packet P2 is read immediately after being written and is supplied to the voice decoder 33.
Thus, even if the voice packet P1 and P2 are transmitted from the source terminal 10 at 1 s time interval between them, the difference of 1 s in propagation delay time between the two voice packets causes the arrival at the destination terminal 30 at the time interval of 2 s. However, even in such a case, determining an initial input location of the buffer 32 and applying deference by the buffer 32 as described above enables supplying the voice packet P1 and P2 to the voice decoder 33 at the same time interval as the transmission interval of the source terminal 10. In other words, it is possible to reduce delay jitter as large as 1 s by allotting an initial input location for a first voice packet to an area which will be output later than the read area as of the receiving by an area equivalent to 1 s.
Looking at a group of serial voice packets transmitted from the source terminal 10 to the destination terminal 30, their propagation delay time vary from the minimum value dmin to the maximum value dmax as shown for example in FIG. 17. In a conventional art, when a first voice packet P1 is received at the destination terminal 30, an initial input location is allotted to an area corresponding to an address that will be output later than the readout address as of the receiving by the number of areas equivalent to the delay jitter width D=dmax−dmin, and the voice packet P1 is written therein. Deciding the initial input location in this way enables the complete elimination of pre-assumed delay jitters.
More detailed description will be given hereinafter with reference to FIGS. 14A, 14B, 14C, 15 and 16. In the following description, it is assumed that the delay jitter width is 4 s. Also, for the sake of simplicity, we will assume a case where the minimum delay time dmin is 0 s and the delay jitter width of the network 20 is equal to the maximum delay time dmax.
In FIG. 14A, the voice packets P11 and P12 are packets output consecutively from the voice encoder 11 of the source terminal 10. Likewise, the voice packets P21 and P22 are packets output consecutively from the voice encoder 11 of the source terminal 10. FIG. 14B illustrates each voice packet that has reached the receiving unit 31 of the destination terminal 30. In the example shown, the voice packets P11 and P12 reach the receiving unit 33, both being delayed the maximum delay time dmax=4 s. On the other hand, the voice packets P21 and P22 reach the receiving unit 31, the former being delayed the minimum delay time dmin=0 s and the latter being delayed the maximum delay time dmax=4 s. FIG. 14C then illustrates each of the voice packets being supplied to the voice decoder 33 after deference being applied.
FIG. 15 shows how deference is performed to the packets P11 and P12 by the buffer 32, and FIG. 16 shows how deference is performed to the packets P21 and P22 by the buffer 32.
As shown in FIG. 15, the voice packet P11 that has reached the receiving unit 31 at time t5 is written in the area of address #5, which is the initial input location, thereby being delayed for delay time of 4 s and output from the buffer 32 to the voice decoder 33 at time t9. Then, the voice packet P12 that has reached the receiving unit 31 at time t6 is written in the area of address #6, an area where a readout will be performed at the earliest timing among vacant areas as of the receiving, thereby being output from the buffer 32 at time t10 that is the next timing of the output time for the voice packet P11.
On the other hand, deference such as follows is performed for the voice packet P21 and P22. First of all, as shown in FIG. 16, the voice packet P21 that has reached the receiving unit 31 at time t1 is written in the area of address #5, which is the initial input location, thereby being delayed for delay time of 4 s and output from the buffer 32 at time t5. Then, the voice packet P22 that has reached the receiving unit 31 at time t6 is written in an area where a readout will be performed at the earliest timing among vacant areas as of the receiving, thereby being output immediately from the buffer 32.
As described so far, if an initial input location is set to an area of address which will be output later than the read address as of the receiving by the number of areas equivalent to the delay jitter width D=dmax−dmin, it becomes possible to reduce every delay jitter in the range of the minimum value dmin and the maximum value dmax.
However, in the conventional art described above, that a first voice packet received by the destination terminal 10 is delayed for delay time which is equivalent to the delay jitter width D means that the same amount of delay time will be applied for the succeeding voice packets. If it is assumed that delay time required for the first voice packet to pass a network is d0 here, the total delay time T will be D+d0, the total delay time T designating the amount of time required for each voice packet to reach the voice encoder 33 of the destination terminal 30 since the point of being output from the voice encoder 11 of the source terminal 10. However, the delay time of the first voice packet varies from the minimum value dmin to the maximum value dmax, which in turn makes the total delay time T depended on the delay time d0 of the first voice packet. That means that, in the case of the delay time d0 of the first voice packet being the minimum delay time dmin, the total delay time T can be made short. However, in a case where the delay time of the first voice packet is as long as the maximum delay time dmax, the total delay time T results in a long period of time two times the maximum delay time dmax. In recent years, the prevalence of such as an Internet telephony using VoIP (Voice over IP) technique has caused a call for high-quality communication, which requires the shortening in the total delay time. Thus, it is unfavorable that the total delay time T becomes long for the sake of reducing delay jitter.