1. Field of the Invention
The present invention relates generally to an apparatus and a method for playout scheduling in a Voice over Internet Protocol (VoIP), and more particularly to an apparatus and a method for a receiver of the VoIP system to robustly handle influence of a jitter by adaptively adjusting a size of a playout buffer.
2. Description of the Related Art
Currently, transmission of time-varying multimedia such as voice and audio over an Internet Protocol (IP) network is increasing. Yet, because the jitter caused by a network delay is considerable and it is difficult to predict the jitter, an IP network using a best effort service does not guarantee Quality of Service (QoS) of the voice, which requires the real-time service. In the packet delivery over the IP network, the network delay jitter occurs because the packets arrive at a destination node through different delays.
When the packets are received after an intended playout time because of the considerable delay jitter, a user suffers discontinuous voice signal. The playout time is a time taken to receive voice packets, to convert the voice packets to a voice signal, to amplify the voice signal, and to reproduce the audio.
To address the above-mentioned shortcoming, in the related art, the receiver, which includes a playout buffer for buffering, stores the decoded voice PCM samples to the buffer; and then outputs the audio, rather than restoring and outputting the corresponding packets as the voice signal upon the reception of the packet. However, using the playout buffer, the user experiences the delay that lasts as long as the first buffering time at the initial call setup, and the playout scheduling is not conducted for the jitter variation in the process of the call. As a result, to the related art cannot actively handle the delay varying according to the condition of the network.
To avoid this problem, a method has been suggested for adaptively adjusting the playout time by predicting the jitter caused by the network delay and adjusting a size of Pulse Code Modulation (PCM) samples. Herein, the method for adjusting the playout time is referred to as a playout scheduling, and a technique for increasing or decreasing the number of the PCM samples using the characteristic of the voice for the playout scheduling is referred to as a Time Scale Modification (TSM). The TSM determines a scale ratio to increase or decrease the number of the PCM samples. The audio quality greatly depends on a performance of an algorithm of determining the scale ratio.
FIG. 1 illustrates a conventional structure for playout scheduling at a receiver of a VoIP system.
Referring to FIG. 1, the receiver measures the delay in every reception of the packet, predicts a next packet arrival time by taking statistics on delay, i.e., predicts a next packet delay {circumflex over (d)}ni+1 in step 101, adjusts the number of voice PCM samples received in the current packet by determining the scale ratio Si based on the predicted delay in step 103, and adjusts the playout time so that the user cannot recognize the discontinuous voice signal. Herein, the scale ratio indicates the increase or the decrease of the length of the PCM samples.
FIG. 2 is a flowchart illustrating a conventional method of a receiver for determining a playout time through delay prediction in a VoIP system.
Referring to FIG. 2, after a call is set up in step 201, the receiver checks if a packet is received in step 203. When a packet is received, the receiver checks if the packet is an i-th packet in step 205. When the packet is the i-th packet the receiver updates to wait for the (i+1)-th path in step 207, and decodes the current i-th packet in step 209 in order to acquire PCM samples of the i-th packet. The receiver predicts a reception time of die next (i+1)-th packet, i.e., predicts the (i+1)-th packet delay {circumflex over (d)}ni+1 in step 211, and defines the scale ratio Si of the PCM samples of the i-th packet according to the predicted delay in step 213. Herein, the scale ratio of the PCM samples of the i-th packet is determined using a ratio of the delay time of the (i+1)-th packet to the playout time of one packet.
Upon determining the scale ratio of the i-th packet, the receiver adjusts the size of the i-th packet at the determined scale ratio, i.e., adjusts the number of the PCM samples of the i-th packet in step 215, stores the adjusted PCM samples to the playout buffer in step 217, and then returns to step 203.
By contrast, when the packet is not the i-th packet in step 205, the receiver checks if the received packet is a packet after the i-th packet in step 219. When the packet is the packet after the i-th packet, the receiver stores the packet to a queuing buffer in step 221. When the packet is a packet prior to the i-th packet, the receiver discards the packet in step 223.
The receiver checks if the set call is released in step 225. When the call is not released, method returns to step 203. When the call is released, the receiver finishes this conventional process.
As discussed above, the TSM in the conventional playout scheduling method reproduces the voice signal by adjusting the playout time through the delay prediction on the next packet. However, because the delay in the IP network may abruptly change according to the condition and the jitter may increase unexpectedly, it is highly likely that the delay prediction is not conducted accurately. Consequently, it is difficult to robustly deal with the jitter variation of the IP network using the scheduling method that determines the playout time through the delay prediction.