The standard H.264, which is also known as MPEG-4 or AVC (Advanced Video Coding), is the state of the art video coding standard. The H.264 relates to a hybrid codec, which may eliminate redundancies between frames and/or within a frame. The output of the encoding process according to H.264 is VCL (Video Coding Layer) data which is further encapsulated into NAL (Network Abstraction Layer) units prior to transmission or storage.
The standard H.264 includes the definition of different profiles, which are denoted e.g. “Baseline profile”, “Main profile” and “Extended profile”. For each such profile, a set of binary capabilities of a terminal or client is defined. For example, “Main profile” includes CABAC (Context Adaptive Binary Arithmetic Coding), which is not included in “Baseline profile”.
The standard H.264 further includes the definition of different “levels”, which relate to e.g. the capabilities of a codec. The definition of a level includes e.g. a maximum number of macroblocks per second, a maximum frame-size, a maximum DPB (Decoded Picture Buffer) size, and a maximum video bit rate. The different levels may be specified e.g. in a table as illustrated in FIG. 1a. The table in FIG. 1a is part of a table defining profile independent levels in the standard ISO/IEC 14496-10. When a codec fulfills all requirements, e.g. in terms of capacity, of a certain defined level, the codec could be said to support, be compliant to, or conform to, said level. A media bit stream having characteristics, e.g. in terms of frame rate and/or bit rate, within the defined limits of a certain level could be said to be compliant to, or conform to, said certain level.
Typically, the conformance of a media content or bit stream to a particular level is specified by the setting of a syntax element associated with the media content, which element may be denoted, e.g. “level_idc”, to a certain value associated with said particular level.
A client can determine the complexity, or level, required for playing a certain media content by analyzing the value of said syntax element associated with a content or bit stream, and thus determine whether the client supports the playout of said certain media content. If the level required for playing the media content is equal to or below the level associated with the client, the client supports the playout of the media. If the level required for playing the media content exceeds the level associated with the client, the client may not be capable of playing the media content.
A media content located e.g. at a media server, is associated with a certain predefined regular playout rate, and it is the level required for playing out this regular playout rate that is indicated e.g. in the “level_idc”. For example, when playing a movie, the regular playout rate would be the “regular speed”, or “real-time speed” of the movie, such as it would be played e.g. in a movie theater or in television. The term “regular playout rate” implies that all frames types of the content are displayed, e.g. I-frames (Intra-coded frames), P-frames (Predicted frames) and B-frames (Bi-predicted frames), in case of video content.
A media bit stream may also be played in a non-regular playout rate, i.e. non-real-time playout or playback. Below, some examples of achieving faster than real-time, or “fast forward”, playout of a media bit stream will be described.
The simplest method of achieving “fast forward” is to play or playout a stream at a faster rate than its original or predefined regular rate, by increasing the number of frames played out per second. This method has the drawback of increased requirements on processing power. For example, for a client to be able to fast forward a media bit stream at 10× normal speed using this method, the client must have a processing power which supports a ten times higher decoding complexity than when playing the media bit stream at the regular playout rate. The above described method of achieving faster than real-time playback or playout is illustrated in FIG. 1b, where sequence or stream 102b is played out at normal rate or speed, and sequence 104b is played out at 2× normal rate, i.e. twice as fast as sequence 102b. 
Another method of achieving “fast forward”, which requires less processing power than the previously described method, is to play out e.g. only the I-frames of a video media bit stream. This method could be described e.g. as “jumping between I-frames”, and is illustrated in FIG. 2. In FIG. 2, the sequence or stream 202 is played out at normal rate or speed. All frames comprised in the stream or content, such as I-, P-, and B-frames are played out. In sequence 204, only the P-frames (shaded in FIG. 2) of the stream are played out, which in this case creates a “pseudo” 2× normal rate, since every second frame is played out. This method is in fact an operation on the stream rather than a real speed-up. The reduction of the number of frames to be played out reduces the required complexity or processing power of a client to a degree depending e.g. on the distance between the I-frames in the media bit stream. A drawback of this method is that the “fast forward” speed cannot be freely controlled due to e.g. I-frame distance constraints. A finer granularity of fast forward speed, e.g. fraction of I-frame distance, is not possible.
Another drawback of the method of “jumping between I-frames” is the high overhead associated with this solution. Extensive overhead may imply high bandwidth demands. The complete stream (all frames) must be sent to the receiver or client, which filters and discards the “unwanted” frames (majority of frames).
Yet another method of achieving “fast forward” is to use prior knowledge of e.g. a video stream. This knowledge could be, e.g., that a certain Group Of Picture (GOP) structure, or a fixed periodicity for key-frames, is used. This information could be used e.g. to determine which frames that could be left out when displaying the video stream.
The main problem with existing “fast forward” solutions, such as the ones described above, is that the decoding complexity of a media stream when being played in “fast forward mode”, and thus the required processing capacity or level for playing out the media stream, cannot be easily acquired by a client, which is about to e.g. retrieve or request the media stream, or, which is about to start “fast forwarding” of a media stream, which is currently being downloaded.
The specified “levels” regulate the upper limits of every aspect of decoding complexity including e.g. frame size, motion vector range and max bit rate. In order to be compliant to or conform to a particular level, a media stream must conform to all the specified limits associated with the level. Consequently, some media content or streams may be categorized into a “high” level due to that e.g. only one of its characteristics has a high value, such as e.g. a large frame size. At the same time, the other characteristics of the same media stream may have “low” values, i.e. lower than the values specified for said “high” level, which actually could give that the complexity of the stream as a whole, in fact, could be rather low, and that a lower level would suffice to cope with these characteristics. An example of such a media stream could be e.g. a 2 Hz, 1280×720p sequence, which has a relatively large frame size, but a very low frame rate.
In order to reduce decoding complexity for a client, the frame rate of a video stream could be reduced at the content server, to simplify fast forwarding at the client. However, such a reduction cannot be indicated to a client, and consequently clients cannot benefit from the “help” thus provided by the server.
Further, even if a client has access to prior knowledge about the encoding of a stream, the client may not be able to deduce the decoding complexity of the stream after that e.g. a frame reducing operation has been performed in the server or in the client.
Thus, when a client is to perform “fast forward” of a media stream, the client has no way of knowing if supporting the level indicated by or for the media stream will be sufficient for decoding the media stream in “fast forward” mode. Due to this uncertainty, clients are typically equipped with and use a more powerful, somewhat “overdimensioned”, decoder, to “be on the safe side”. This is very inefficient, e.g. in terms of computational resources.