1. Field of the Invention
The present invention is related to the field of communications through networks, and more specifically to devices, softwares and methods for predicting at a source how well a specific encoded frame would be reconstructed at a destination, and devices, softwares and methods for adjusting a playout delay of a jitter buffer.
2. Description of the Related Art
Networks, such as the internet, were primarily made for data communication in an asynchronous mode. The data is encapsulated into packets, and each packet is transmitted individually. The packets are received at the destination, and the data is extracted.
Recently networks are being used increasingly for communications. Data is transmitted, received, and played out in real time. For voice communications, for example, a Voice over Internet Protocol (VoIP) is used to transmit real-time voice traffic over an Internet Protocol (IP) network. Other applications are being devised for other types of real time media.
Referring now to FIG. 1, a format for data transmission is described in more detail. The format of FIG. 1 is intended for real time transmission. It will be appreciated that FIG. 1 may apply to any type of such transmission, such as a two-way voice conversation, a one-way broadcast stream such as video or radio, etc.
A network 110 is used to facilitate a transmission from a source device 120 to a destination device 130. Network 110 may be any communications network, such as the internet, a Local Area Network (LAN), a Metropolitan Area Network (MAN), etc.
Source device 120 is also called merely source 120. It establishes a connection 122 with destination device 130. Then source 120 transmits data packets 125 through network 110 towards destination device 130. If the communication is two-way, then packets may be transmitted also in the opposite direction.
Destination device 130 is also called merely destination 130. Destination 130 includes a jitter buffer 132, a decoder 134, a packet loss reconstruction (PLC) module 136 and a playout module 138.
Jitter buffer 132 holds packets 125 as they are received from network 110. Decoder 134 decodes the packets stored in jitter buffer 132. PLC module 136 reconstructs the data of those packets that are not received. Then a stream of data frames (some decoded, some reconstructed) are input in playout module 138. The latter may include a speaker (for voice), a screen (for video or still images), etc.
The requirement of real time transmission has presented problems. The problems arise from the fact that networks were initially designed to be asynchronous. These problems are now described in more detail.
A first problem is that packets 125 are simply lost in network 110. This results in packet loss L1. This is not a problem for most non-real-time applications, which use a reliable transport protocol, because a lost packet will be discovered and retransmitted. But for a real time application, there will be no time for this type of recovery.
A second problem is that packets 125 do arrive at destination 130, but delayed. They may not arrive in time for playout, which is the same as if they had been lost. One possible reason for such a delay is congestion at a specific node of network 110, e.g. at one of its routers (not shown individually). This type of loss is characterized as a packet loss L2. Loss L2 is shown as happening within network 110, even though the actual discarding may take place farther in destination device 130.
A third problem is that sometimes jitter buffer 132 becomes full to capacity. Some of the received packets 125 are discarded to make more room, even though they were not lost and arrived in time. This discarding is equivalent to another source of packet loss L3.
Jitter buffer 132 may be adaptive. It has a playout delay that may be variable, depending on the exhibited jitter of packets 125. The higher jitter is ascribed to network congestion. Upon perceiving high jitter, the playout delay is adjusted to a higher value. This gives a higher opportunity for packets to arrive, thus not miss their deadlines, and thus minimizing losses L2 and L3 of FIG. 1. But lengthening the playout delay presents other problems, by increasing the total end-to-end delay.
Packets from jitter buffer 132 are decoded in decoder 134. Decoder 134 outputs the decoded data to PLC module 136, for use in concealment of subsequent frames.
PLC module 136 then reconstructs the data of those packets that have not been received. In other words, it tries to correct for losses L1, L2, L3. This way the system tolerates losses L1, L2, L3.
A fourth problem is that PLC module 136 does not reconstruct well the data of the missing packets. In other words, the effectiveness of packet loss reconstruction is not uniform for all packets. To the extent that the data of some packets is reconstructed poorly, this is represented as a loss L4, even though it is technically not a data loss, but a loss in Quality of Service (QoS). If in a voice application, loss L4 is that the voice is reconstructed poorly. If in an application that involves transmitting images (e.g. video), then loss L4 is that the images are reconstructed poorly.
Loss L4 may be for a number of reasons. For example, if too many packets are lost in a row (e.g. the losses L1, L2, L3 being “bursty”), then the reconstruction process has less data to work with. In some instances this is more of a problem than others. For example, if the data content repeats over many frames, then it is less critical, and thus less of a problem. But if it contains abrupt changes, then it is more critical. Another one may be the programming of PLC module 136 not being attuned to the nature of the data or the loss. Regardless, loss L4 is compounded to the portion of losses L1, L2, L3 that is not corrected for.
It is desirable to have playout with a high Quality of Service (QoS), even in the face of such losses, and without a long playout delay time.