With the advent of digital media streaming technology (such as those using unicast, multicast, and broadcast), users are able to see and hear digital media, more or less, as the data is being received from a media server.
Herein, a “media stream” is a multimedia object (containing audio and/or visual content, such as a video) that is compressed and encoded in accordance with mechanisms generally available now or in the future for doing so. Furthermore, such a media stream is intended to be decoded and rendered in accordance with generally available mechanisms for doing so.
Without a loss of generality, the same techniques can be applied to any media stream that has a similar structure, which reduces temporal or spatial redundancies. For example, many audio compression formats such as AC3 have keyframes followed by modification data to regenerate an approximation of the original uncompressed stream.
Multimedia Distribution Format Standards
Due to the amount of data required to accurately represent such multimedia content, it is typically delivered to the computing device in an encoded, compressed form. To reproduce the original content for presentation, the multimedia content is typically decompressed and decoded before it is presented.
A number of multimedia standards have been developed that define the format and meaning of encoded multimedia content for purposes of distribution. Organizations such as the Moving Picture Experts Group (MPEG) under the auspices of the International Standards Organization (ISO) and International Electrotechnical Commission (IEC), and the Video Coding Experts Group (VCEG), under the auspices of the International Telecommunications Union (ITU), have developed a number of multimedia coding standards (e.g., MPEG-1, MPEG-2, MPEG-4, H.261, H.263, and the like).
There are many different standardized video-stream data formats. For example: MPEG, H.263, MPEG-1, MPEG-2, MPEG-4 Visual, H.264/AVC, and DV formats. Likewise, there are many different standardized audio-stream data formats. For example: MPEG audio, AC3 audio, DTS audio, or MLP audio.
MPEG-2/H.262
The predominant digital video compression and transmission formats are from a family called block-based motion-compensated hybrid video coders, as typified by the ISO/IEC MPEG-x (Moving Picture Experts Group) and ITU-T VCEG H.26x (Video Coding Experts Group) standards. This family of standards is used for coding audio-visual information (e.g., movies, video, music, and such) in a digital compressed format.
For the convenience of explanation, the MPEG-2 media stream (also known as an H.262 media stream) is generally discussed and described herein, as it has a structure that is typical of conventional video coding approaches. However, those who are skilled in the art understand and appreciate that other such digital media compression and transmission formats exist and may be used.
A typical MPEG-2 video sequence is composed of a sequence of frames that is typically called Groups of Pictures (or “GOP”). A GOP is composed of a sequence of pictures or frames. The GOP data is compressed as a sequence of I-, P- and B-frames.
An I-frame (i.e., intra-frame) is an independent starting image—(compressed in a similar format to a JPEG image). An I-frame or “key frame” is encoded as a single image, with no reference to any past or future frames. It is sometimes called a random access point (RAP).
Those of ordinary skill in the art are familiar with the relationships between the I-, P- and B-frames.
Transmission of Media Streams
For practical purposes, the continuous media streams carrying audio or video from a media-stream encoder are typically broken into multiple packets for transmission. These packetized streams are typically called packetized elementary streams (PES). These packets are identified by headers that contain time stamps for synchronizing PES packets.
A transport stream typically carries many different media streams. A media-stream decoder must be able to change from one media stream to the next and correctly select the appropriate audio and data channels of the newly selected media stream. Since each of the media streams may be viewed as a “channel,” the act of changing from one media stream to another may be generically called “changing channels.” Also, the act of starting a media stream, where none has been received before, may be called “changing channels” as well.
Time Stamp
After compression, the media-stream encoder typically sends frames out of sequence because of bidirectional encoding. The frames require a variable amount of data and are subject to variable delay due to multiplexing and transmission. For many reasons (including, for example, synchronizing the audio and video streams), time stamps are periodically incorporated into the media stream.
Time stamps indicate where a particular GOP belongs in time. When a decoder receives a PES packet, it decodes and buffers each frame. When the timeline count reaches the value of the time stamp, the frames in the buffer are read out.
When bidirectional coding is used, a frame may have to be decoded some time before it is presented, so that it can act as the source of data for another frame. Although, for example, pictures can be presented in the order IBBP, they will be transmitted in the order IPBB. Consequently, two types of time stamp exist. The decode time stamp (DTS) indicates the time when a picture must be decoded, whereas a presentation time stamp (PTS) indicates when it must be presented to the decoder output.
The PTS only indicates the presentation time of the first discretely presented portion of the frame, not the presentation time of subsequently presented portions (e.g., subsequent fields of a video frame, or subsequent samples of an audio frame).
Since the focus herein is on presentation of a stream as soon as possible, the discussion will ignore the DTS and instead refer to the PTS.
PCR
In a transport stream, the each channel may have originated at a different geographic location; and, therefore, is not likely to be synchronized. As a result, the transport stream typically provides a separate means of synchronizing for each channel. This synchronization uses a Program Clock Reference (PCR) time stamp, and it recreates a stable reference clock.
Some media-stream encoders provide an explicit PCR for each frame. Others provide them for some frames; thereby, leaving the PCR for the other frames to be determined implicitly. Herein, it is assumed that every packet has an explicit PCR or a PCR that may be determined implicitly.
Typically, media-stream encoders generate streams that obey certain bit-rate and timing restrictions. This is the obligation of the encoder.
Some encoders produce streams that are true CBR (constant bit-rate) streams. For true CBR streams, the PCR can be inferred to be equal to or a small negative offset from the DTS.
Some encoders generate VBR (variable bit rate) streams which still obey a specifiable max bit-rate restriction. For VBR streams, the encoder may give an explicit PCR on each packet, it may give an explicit PCR but only for some packets, or it may give no explicit PCR on any packet.
Underflow
When a receiving unit runs out of data to decode (or present), it is called “underflow.” Underflow occurs when the receiving unit is ready to decode (or present) the next frame, but the has not yet received (or decoded) all of the data of that frame.
The practical and noticeable manifestation of an underflow is a temporal interruption (i.e., “hiccup” or “stutter”) in the presentation motion video rather than the desired effect, which is a smoothly playing motion video. For example, instead of showing a motion video at a fixed frequency (e.g., 15 frames per second), a receiving unit experiencing underflow would show a frame of the video stream followed by a noticeable delay before the next or a later frame would display. This may continue for several seconds or minutes.
The conditions for an underflow are particularly ripe when a receiving unit changes channels. If the receiving unit immediately presents frames as soon as it receives frames and decodes incoming media stream of the new channel, an underflow condition is likely to arise.
Instead, it is common for the multimedia content to be presented to the user at some defined point (e.g., when the buffer is full enough or after a defined delay from reception). As the multimedia content plays, the receiving device empties the data stored in its buffer. However, while the receiving device is playing the stored multimedia, more data is downloaded to re-fill the buffer. As long as the data is downloaded at least as fast as it is being played back in such a way that the buffer is never completely empty, the file will play smoothly.
Typically, media-stream encoders generate streams that obey certain bit-rate and timing restrictions. Each transport packet produced by the encoder has an explicit or implicit PCR (program clock reference). The encoder guarantees that if the transport packets are sent at the times indicated by each packet's PCR, the stream as a whole will obey the bit-rate, timing, and causality restrictions of the stream.
For example, if the bit-rate restriction is that a media stream will be no more than one mega-bits per second (i.e., 1 Mb/s) when measured over a five second window, and the timing restriction is that the PCR to PTS delay will never be more than three seconds, and the causality restriction is that the PCR is less than or equal to DTS and the PCR is less than or equal to PTS, then the encoder must produce transport packets and PCRs and PTSs and DTSs (each either explicit or implicit) for the stream that, when taken together and sent according to each packet's PCR, obey all these restrictions.
These conventional techniques, described above, that are designed to prevent underflow, produce an annoying side effect: “channel start-up delay.”