This invention relates in general to real-time media playout and, more specifically, to computer media playout of network provided media.
The receiver part of a packet communication device is characterized by having as input an asynchronous flow of data packets. A real time media playout device needs to produce a continuous output of data, e.g., to a loudspeaker. Furthermore, for a real time application such as telephony, streamed audio, or streamed video it is important that the delay is minimized and therefore the delay in the receiver and the playout device should be as small as possible.
The real-time media device is defined as the software and/or hardware that convert the digital media signal to a signal suitable for playout, for example an analog signal that can be fed into a speaker. In a computer the real-time media device is usually referred to as a sound card or sound board when the media is sound and a video card or video board when the media is video. The packet communication device is usually referred to as a network adaptor, which in a computer could be a network card or a modem.
The quality of media playout is a subjective standard and is an important part of achieving high Quality of Service (QOS). The incoming packets from the network are characterized by at least three factors that affect the playout quality and the QOS, namely, latency, clock drift and packet loss. Latency is a measure of how the packets are delayed in the network. This delay can vary from packet to packet. These variations are referred to as jitter. Clock drift is when the incoming packets arrives at a pace that does not represent the pace of the playout by the playout device. Packet loss is when not all of the packets that represent the media stream are received for playout. Real-time media playout with adequate QOS maintains a high perceptual quality to the user. That also includes having low delay, to ensure high quality real time communication.
A prior art receiver used in a state of the art real time communication system over packet networks consists of: a jitter buffer, a decoder, a packet loss concealment unit, and a real time media playout device. The jitter buffer accumulates packets to mitigate the effects of jitter and reorders the received packets, if necessary, so that the packets can be taken out of the buffer with regular intervals in proper temporal order. The decoder converts the digital information in the packets into a media signal that can be fed to the playout device. The packet loss concealment (PLC) unit produces a media signal when a packet is lost (i.e., the packet is not available in the jitter buffer). For example in sound playout, one simple packet loss concealment technique is to replace the missing packets with zeros. This technique is usually referred to as zero-stuffing. The real time media playout device (e.g., a sound card or other audio device on a computer) typically has a buffer and a digital to analog (D/A) converter.
With reference to FIG. 1, a simple transmitter and a simple receiver are respectively shown as Side A and Side B. The simple receiver consists of a packet communication device, a decoder and a real time media device for playout to a loudspeaker. As soon as a packet arrives from the packet network the packet communication device will send it to the decoder that decodes it and sends it to a buffer in the real time media device. The real time media device converts the audio from digital to analog and sends the analog signal to the loudspeaker. This playout design suffers from a loss in perceptual quality because there is no mechanism to mitigate jitter and reordering and there is no packet loss concealment unit.
Also, if the simple receiver does not have the same frequency as the simple transmitter there will be a mismatch sometimes referred to as “clock drift”. For example, assume that the simple transmitter uses frequency fsA Hz and the simple receiver uses frequency fsB Hz and that fsB>fsA. This means that the simple transmitter records fsA samples per second and that the simple receiver plays out fsB samples per second. Since fsB>fsA in our example that will cause the buffer in the simple receiver to run out of media to playout presuming the recording is done in real time. This discontinuity will have the same effect as having packet losses where zero-stuffing is used as the concealment method.
Referring next to FIG. 2, a more advanced receiver is shown that consists of a packet communication device, a jitter buffer, a decoder and a real time media device for playout to a loudspeaker. Side A is not shown in relation to FIG. 2, but is the same as Side A shown in FIG. 1. As soon as a packet arrives from the Internet protocol (IP) network the packet communication device places it in the jitter buffer that reorders the packet if necessary. A timer decides when to extract a packet from the jitter buffer and decode it. This timer is set to trigger at the same time interval as the packet size. When the packet is decoded it is put in the real time media device's buffer. If no packet is present in the jitter buffer when the timer is triggered then packet loss concealment is performed and the data produced is put in the real time media device's buffer. The real time media device converts the audio signal from being digital to analog and sends the analog signal to the loudspeaker.
This more advanced playout method will correct the problem of reordered packets and may have a more advanced packet loss concealment method than zero-stuffing. But since the timer for the call to the jitter buffer is not synchronized with the clock in the real time media device it will be a mismatch so that it will still suffer from “clock drift”. If the timer is “faster” than the clock in the real time media device the buffer in the real time media device is filled faster than it manages to playout and the delay increases over time. If on the other hand, the playout is faster than the timer then the buffer will run out of media to playout, i.e., it will underrun. The underrun will sound like packet loss with zero-stuffing as the packet loss concealment method. This problem is caused by the fact that two clocks are used; one for the interaction with the packet network, jitter buffering and decoding and one for the real time media device.
One solution for this problem could be to use one clock for all processes, like on a DSP, but this is not practical with PCs where the central processing unit handles the interaction with the packet network, jitter buffering and decoding and the sound card handles D/A-conversion. But this is not the only inaccuracy that will cause “clock drift,” as explained in relation to FIG. 1 above. If for some reason the CPU clock and the sound card clock were to be perfectly synchronized the different sampling rates on side A and side B as can be seen in the example of FIG. 1 would cause the jitter buffer to either run out of data or get filled up. The first scenario explained in relation to FIG. 1 would happen if fsA>fsB and the second scenario explained in relation to FIG. 2 would happen if fsA<fsB.
In the appended figures, similar components and/or features may have the same reference label.