Digital video must be extensively compressed prior to transmission and storage, where each picture includes multiple pixels and each pixel is associated with multiple multi-bit values.
In a typical scenario, a non-compressed media stream, or “raw” media stream, includes a sequence of substantially equal frames. These frames are eventually presented at a constant rate. As described below, once the media stream is compressed, the size of frames may vary. The transmission of a varying frame size media stream over a network may encounter timing problems, because these frames must be provided in a timely manner to a media player to enable smooth viewing of a video, for example.
Various compression standards, such as, but not limited to the MPEG standards, enable efficient storage and transmission of media information.
Spatial compression usually includes transform coding, quantization and variable length encoding. Transform coding is operable to convert a group of picture pixels to a set of discrete cosine transform (DCT) coefficients. The DCT coefficients of a block represent a predefined number of picture pixels, such as 8×8. The pixels are then quantized and are represented by pairs of amplitude/run-lengths, where the run-length value indicates the number of zeroes between two non-zero coefficients. The amplitude/run-length pairs of a macro-block are coded by a variable length-coding scheme to provide compressed video streams.
Temporal compression is based upon the fact that there is usually little difference between consecutive video frames. A compressed media stream includes many sequences of temporally compressed frames. Each sequence is initiated as a self-contained key-frame, which is independent of preceding frames and is followed by several Inter-frames. Each Inter-Frame includes the difference between itself and at least one other frame.
As a result of the compression schemes, access units of complex scenes are represented by more bits than other access units. Complex scenes, for example, have low temporal redundancy and/or low spatial redundancy. MPEG-4 presentations include a number of media elementary streams, such as video elementary streams and audio elementary streams. Each media elementary stream includes multiple access units, e.g. samples. An access unit is a coded representation of a presentation unit. An audio access unit is the coded representation of an audio frame, while a video access unit includes the data required for presentation of a picture.
An MPEG-4 presentation may be provided to a client device in a streaming mode or in a download mode. A typical client device has a player buffer and a client player. In download mode the presentation is stored at the client device memory (such as the client buffer) and can be later fetched from the memory and processed (by the client player) to enable the display of that presentation. In streaming mode the client device displays the streamed presentation. In streaming mode, trade-offs are needed between the bit rates of the streaming elementary streams, the available bandwidth for streaming these elementary streams over a communication network and the client processing and/or buffering capabilities.
Mismatches may result in client buffer (also termed target buffer or player buffer) over-flow, in which the client device receives too much information and must throw away a part of the information, or in client buffer under-flow (in which the client device does not receive enough information to enable a smooth and/or continuous display of the presentation). Furthermore, as various elementary streams are streamed to the client device, a bit-rate mismatch may result in loss of the desired synchronization between elementary streams. Typically, over-flow is easier to prevent.
Media streams can be transmitted over a network at a constant bit rate (CBR) or at a varying bit rate (VBR). CBR requires a compression of an access unit by a compression ratio (QSCALE) that is responsive to the size of that access unit, as larger access units must be compressed at a higher compression ratio than smaller access units in order to achieve a relatively constant bit rate. VBR usually does not require such a relation between its compression ratio and the size of its access units, but may encounter temporal timing and buffering problems.
Four scientists from the University of Southern California developed a technique named “Multi Threshold Flow Control (MTFC),” that is described as a “multi-threshold online smoothing technique for variable rate streams,” R. Zimmerman, K. Fu, M. Jaharangiri and C. Shahabi.”
MTFC smoothes variable bit rate (VBR) transmissions from a server to a client, without a priori knowledge of the actual bit rate. MTFC utilizes multi-level buffer thresholds at the client side that trigger feedback information sent to the media server. Once a client buffer threshold is crossed, it initiates a feedback process that in turn adjusts the sending rate of the server. The feedback process is based upon a prediction of future bit rate consumption. Three bit rate consumption algorithms were suggested, one being a fuzzy logic based algorithm.