1. Field of the Invention
The present invention relates to a network telephone set and an audio decoding device that utilize VoIP of an Internet telephone set or the like.
2. Description of the Prior Art
For example, Internet telephone sets that carry out audio telephone conversations using the Internet have already been developed. The Internet telephone set utilizes a technique called “VoIP”. VoIP (Voice over Internet Protocol) is a technique that makes it possible to carry out audio telephone conversations on a TCP/IP (Transmission Control Protocol/Internet Protocol) network such as the Internet or the intranet, that is, to transmit and receive audio data.
The Internet telephone set compresses an audio and then, packetizes the compressed audio, to carry out telephone conversations via an IP network, unlike a conventional telephone set. In this type of telephone conversation device, a variation (jitter) may occur in the times when packets arrive in many cases depending on the conditions of the IP network. That is, intervals of the packets which arrive via the IP network may not be fixed in many cases. In order to continuously output a decoded audio on the side of the receiving of the packets, however, coded data must be delivered to a decoder at predetermined intervals. Therefore, a jitter buffer 101 for absorbing the jitter is provided in the preceding stage of a decoder 102, as shown in FIG. 1.
The jitter buffer 101 comprises a plurality of buffer portions (packet storage portions) for respectively storing a plurality of packets. The packets which have arrived are stored in the order of their packet numbers from the left in the buffer portions in the jitter buffer 101. The packet stored in the buffer portion on the leftmost side is read out for each predetermined time period, and is delivered to the decoder 102. When one of the packets is delivered to the decoder 102, the other packets in the jitter buffer 101 are shifted one at a time leftward. The decoder 102 decodes the packet (coded data) delivered from the jitter buffer 101, and outputs the decoded packet.
As shown in FIG. 2a, at the time when the packet stored at the leftmost end of the jitter buffer 101 is delivered to the decoder 102, a distribution representing the positions of the buffer portions storing the packets which have arrived shall be called the distribution of the times when the packets arrive. The reason why such a distribution is called the distribution of the times when the packets arrive is that the distribution represents the distribution of the times when the packets which have arrived are stored in a case where the left end of the jitter buffer 101 is taken as the origin, the time is taken in the rightward direction, and the probability is taken in the upward direction. When the distribution of the times when the packets arrive is S0, as shown in FIG. 2a, the jitter buffer 101 efficiently functions. In the distribution S0 of the times when the packets arrive, as shown in FIG. 2a, the probability that the packet which has arrived is stored in the fifth buffer portion from the left is the highest.
When fixed delay in the IP network is reduced during telephone conversations, the distribution of the packets which arrive at the jitter buffer 101 is moved from S0 to S1, as shown in FIG. 2b. In this case, the time T is fixedly delayed in the jitter buffer 101, which causes interference with smooth telephone conversations, although the fixed delay in the IP network is reduced.
When the fixed delay in the IP network is increased during telephone conversations, the distribution of the packets which arrive at the jitter buffer 101 is moved from S0 to S2, as shown in FIG. 2c. In this case, the packet which arrives at a portion departing from the jitter buffer 101 cannot be outputted to the decoder 102, so that the audio quality is degraded, similarly to the packet loss.
When the amount of jitter in the IP network is increased during telephone conversations, the distribution of the packets which arrive at the jitter buffer 101 is changed from S0 to S3, as shown in FIG. 2d. In this case, the packet which arrives at the portion departing from the jitter buffer 101 cannot be outputted to the decoder 102, so that the audio quality is degraded, similarly to the packet loss.
When the amount of jitter in the IP network is reduced during telephone conversations, the distribution of the packets which arrive at the jitter buffer 101 is changed from S0 to S4, as shown in FIG. 2e. In this case, the time T is fixedly delayed in the jitter buffer 101, although a buffer amount required to absorb jitter in the IP network is reduced, so that the utilization efficiency of the jitter buffer 101 is low.
In order to make the distribution of the times when the packets arrive most suitable, it is considered that the number of packets stored in the jitter buffer 101 is adjusted. For example, when the distribution of the times when the packets arrive is as shown in FIG. 2b or 2e, the packets stored in the jitter buffer 101 are discarded (thinned), thereby making the distribution of the times when the packets arrive most suitable. Further, when the distribution of the times when the packets arrive is as shown in FIG. 2c or 2d, the packets stored in the jitter buffer 101 are duplicated, thereby making the distribution of the times when the packets arrive most suitable.
In a method of adjusting the number of packets stored in the jitter buffer 101 (the amount of storage of packets), however, the quality of an output audio is degraded depending on the discard or duplication of the packets.
Judgment whether or not the packets stored in the jitter buffer 101 should be discarded (thinned) or duplicated has been conventionally made by calculating an arrival delay deviation among the plurality of packets and on the basis of the calculated arrival delay deviation. In the judging method, however, a sufficient amount of data is required to calculate an arrival delay deviation (statistics) high in reliability, so that the control of the number of packets stored in the jitter buffer 101 is delayed.
The control of the number of packets stored in the jitter buffer 101 is, in other words, the control of a delay time period elapsed from the time when the packet is stored in the jitter buffer until the packet is decoded.