Digital video must be extensively compressed prior to transmission and storage, as each picture includes multiple pixels, and each pixel is associated with multiple multi-bit values.
In a typical scenario, a non-compressed media stream (also referred as “raw” media stream) includes a sequence of substantially equal frames. These frames are eventually presented at a constant rate. As described below, once the media stream is compressed, the size of frames may vary. The transmission of a varying size frame media stream over a network may cause timing problems, as these frames must be provided in a timely manner to a media player.
Various compression standards, such as but not limited to the MPEG standards enable efficient storage and transmission of media information.
Spatial compression usually includes transform coding, quantization and variable length encoding. Transform coding is operable to convert a group of picture pixels to a set of DCT (discrete cosine transform) coefficients, the DCT coefficients of a block (representative of a predefined amount of picture pixels, such as 8×8 pixels) are then quantized and are represented by pairs of amplitude/run-length, whereas the run-length value indicates the number of zeroes between two non-zero coefficients. The amplitude/run-length pairs of a macro-block are coded by a variable length-coding scheme to provide compressed video streams.
Temporal compression is based upon the fact that there is usually little difference between consecutive video frames. A compressed media stream includes many sequences of temporally compressed frames, each sequence initiates by a self-contained key-frame (that is independent of preceding frames) that is followed by several Inter-frames. Each Inter-Frame includes a difference between itself and at least another frame.
As a result of the compression schemes access units of complex scenes (for example, scenes of low temporal redundancy and/or low spatial redundancy) are represented by more bits than other access units. MPEG-4 presentations include a number of media elementary streams, such as video elementary streams and audio elementary streams. Each media elementary stream includes multiple access units (e.g.—samples). An access unit is a coded representation of a presentation unit. An audio access unit is the coded representation of an audio frame, while a video access unit includes the data required for presentation of a picture.
An MPEG-4 presentation may be provided to a client device in a streaming mode or in a download mode. A typical client device has a player buffer and a client player. In a download mode the presentation is stored at the client device memory (such as the client buffer) and can be later fetched from the memory and processed (by the client player) to enable the display of that presentation. In streaming mode the client device displays the streamed presentation. In the streaming mode, there is a need to match between the bit rates of the streaming elementary streams, the available bandwidth for streaming these elementary streams over a communication network and the client processing and/or buffering capabilities.
Mismatches may result in client buffer (also termed target buffer or player buffer) over-flow (in which the client device receives too much information and must throw away a part of the information) or in a client buffer under-flow (in which the client device does not receive enough information to enable a smooth and/or continuous display of the presentation). Furthermore, as various elementary streams are streamed to the client device, a bit-rate mismatch may result in loss of synchronization between ideally synchronized elementary streams. Typically, over-flow is easier to prevent.
Media streams can be transmitted over a network at a constant bit rate (CBR) or at a varying bit rate (VBR). CBR requires a compression of an access unit by a compression ratio (QSCALE) that is responsive to the size of that access unit, as larger access units must be compresses at a higher compression ration than smaller access units in order to achieve a substantially constant bit rate. VBR usually does not require such a relation between its compression ratio and the size of its access units, but may cause temporal timing and buffering problems.
Four scientists from the University of Southern California developed a technique named “Multi Threshold Flow Control (MTFC)” that is described at “multi-threshold online smoothing technique for variable rate streams”, R. Zimmerman, K. Fu, M. Jaharangiri and C. Shahabi”. The article was found at the web site of the university (www.usc.edu).
MTFC smoothes variable bit rate (VBR) transmissions from a server to a client, without a priori knowledge of the actual bit rate. MTFC utilizes multi-level buffer thresholds at the client side that trigger feedback information sent to the media server. Once a client buffer threshold is crossed it initiates a feedback process that in turn adjusts the sending rate of the server. The feedback process is based upon a prediction of futuristic bit rate consumption. Three bit rate consumption algorithms were suggested, one being a fuzzy logic based algorithm.