Digital video must be extensively compressed prior to transmission and storage, as each picture includes multiple pixels, and each pixel is associated with multiple multi-bit values.
In a typical scenario, a non-compressed media stream (also referred as “raw” media stream) includes a sequence of substantially equal frames. These frames are eventually presented at a constant rate. As described below, once the media stream is compressed, the size of frames may vary. The transmission of a varying size frame media stream over a network may cause timing problems, as these frames must be provided in a timely manner to a media player.
Various compression standards, such as but not limited to the MPEG standards enable efficient storage and transmission of media information.
Spatial compression usually includes transform coding, quantization and variable length encoding. Transform coding is operable to convert a group of picture pixels to a set of DCT (discrete cosine transform) coefficients, the DCT coefficients of a block (representative of a predefined amount of picture pixels, such as 8×8 pixels) are then quantized and are represented by pairs of amplitude/run-length, whereas the run-length value indicates the number of zeroes between two non-zero coefficients. The amplitude/run-length pairs of a macro-block are coded by a variable length-coding scheme to provide compressed video streams.
Temporal compression is based upon the fact that there is usually little difference between consecutive video frames. A compressed media stream includes many sequences of temporally compressed frames, each sequence initiates by a self-contained key-frame (that is independent of preceding frames) that is followed by several Inter-frames. Each Inter-Frame includes a difference between itself and at least another frame.
As a result of the compression schemes access units of complex scenes (for example, scenes of low temporal redundancy and/or low spatial redundancy) are represented by more bits than other access units. MPEG-4 presentations include a number of media elementary streams, such as video elementary streams and audio elementary streams. Each media elementary stream includes multiple access units (e.g.—samples). An access unit is a coded representation of a presentation unit. An audio access unit is the coded representation of an audio frame, while a video access unit includes the data required for presentation of a picture.
An MPEG-4 presentation may be provided to a client device in a streaming mode or in a download mode. A typical client device has a player buffer and a client player. In a download mode the presentation is stored at the client device memory (such as the client buffer) and can be later fetched from the memory and processed (by the client player) to enable the display of that presentation. In streaming mode the client device displays the streamed presentation. In the streaming mode, there is a need to match between the bit rates of the streaming elementary streams, the available bandwidth for streaming these elementary streams over a communication network and the client processing and/or buffering capabilities.
Mismatches may result in client buffer (also termed target buffer or player buffer) over-flow (in which the client device receives too much information and must throw away a part of the information) or in a client buffer under-flow (in which the client device does not receive enough information to enable a smooth and/or continuous display of the presentation). Furthermore, as various elementary streams are streamed to the client device, a bit-rate mismatch may result in loss of synchronization between ideally synchronized elementary streams. Typically, over-flow is easier to prevent.
Media streams can be transmitted over a network at a constant bit rate (CBR) or at a varying bit rate (VBR). CBR requires a compression of an access unit by a compression ratio (QSCALE) that is responsive to the size of that access unit, as larger access units must be compresses at a higher compression ration than smaller access units in order to achieve a substantially constant bit rate. VBR usually does not require such a relation between its compression ratio and the size of its access units, but may cause temporal timing and buffering problems.
Fuzzy Logic
Fuzzy logic is the logic of approximate reasoning. Fuzzy systems are usually used when a process is to complex to be modeled using conventional mathematical methods and/or when dealing with imperfect information.
According to the classical set theory an item either belongs to a set or not. Fuzzy set theory, on the other hand, introduced the concept of partial membership. Accordingly, a fuzzy logic variable can partially belong to more than one fuzzy set. The degree of membership can range between 0 and 1 and is defined by a membership function. Typical membership function are shaped as triangles or trapezoids, but this is not necessarily so.
A crisp input variable is converted to a fuzzy input variable by determining which rules out of a predefined set of rules are satisfied (to which fuzzy set does the crisp input value belongs) and to what degree (what is the degree of membership). A fuzzy logic variable is a linguistic expression.
The fuzzy input variables are processed by a rule-based decision process. The initial step of this process includes determining which rule were satisfied by the fuzzy input variables. The process takes into account the degrees of their fulfillment. These rules are expressed in linguistic form.
The output of the rule-based decision is one or more fuzzy output variables that are de-fuzzified to provide one or more crisp output variables.
A typical defuzzification step includes locating the “center of gravity” (centroid) of each satisfied rule and providing a weighted average of said centroids as a crisp output value. A less accurate but simpler defuzzification step may include arithmetic averaging of relevant rules instead of calculating the centroid.
U.S. Pat. No. 6,483,808 of Rochberger et al. describes a method of determining the optimum route from a source to a destination node in an ATM network utilizing fuzzy logic processing. The method is based on a set metrics that may or may not be related to each other. The fuzzy logic processing is divided into two phases each having its own set of rules that the input data is applied against. Fuzzy logic processing is performed for all candidate routes wherein the route chosen is the one having a maximum link quality.
U.S. Pat. No. 6,282,241 of Saw describes an apparatus for video rate control using a fuzzy logic rule-based control. The apparatus implements a fuzzy logic control scheme that determines a quantization scale of a video stream in response to various parameters such as (i) the occupancy of the apparatus buffer and the video stream quality; (ii) the apparatus buffer occupancy and the inter-frame variance of the video stream; or (iii) the amount of bits assigned to a part of a macroblock.
Four scientists from the University of Southern California developed a technique named “Multi Threshold Flow Control (MTFC)” that is described at “multi-threshold online smoothing technique for variable rate streams”, R. Zimmerman, K. Fu, M. Jaharangiri and C. Shahabi”. The article was found at the web site of the University of Southern California.
MTFC smoothes variable bit rate (VBR) transmissions from a server to a client, without a priori knowledge of the actual bit rate. MTFC utilizes multi-level buffer thresholds at the client side that trigger feedback information sent to the media server. Once a client buffer threshold is crossed it initiates a feedback process that in turn adjusts the sending rate of the server. The feedback process is based upon a prediction of futuristic bit rate consumption. Three bit rate consumption algorithms were suggested, one being a fuzzy logic based algorithm.