Multimedia can be formatted in accordance with Moving Pictures Expert Group (MPEG) standards such as MPEG-1, MPEG-2 (also referred to as DVD format), and MPEG-4. Essentially, for individual video frames these multimedia standards use Joint Photographic Experts Group (JPEG) compression, and add a temporal dimension to the spatial dimension of single pictures. MPEG is essentially a compression technique that uses motion estimation to further compress a video stream.
MPEG encoding breaks each picture into blocks called “macroblocks”, and then searches neighboring pictures for similar blocks. If a match is found, instead of storing the entire block, the system stores a much smaller vector that describes the movement (or not) of the block between pictures. In this way, efficient compression is achieved.
MPEG compresses each frame of video in one of three ways. The first way is to generate a self-contained entity referred to as an “intraframe” (also referred to as a “reference frame” and an “information frame” and referred to herein as I-frames), in which the entire frame is composed of compressed, quantized DCT values using JPEG principles. This type of frame is required periodically and at a scene change. An I-frame thus is an example of a frame that is compressed (by virtue of using DCT values that are a form of compression) based solely on information in its own frame, i.e., without reference to information in any other frame.
Most frames, however, (typically 15 out of 16) are compressed by encoding only differences between the image in the frame and the nearest intraframe, resulting in frame representations that use much less data than is required for an intraframe. In MPEG parlance these frames are called “predicted” frames and “bidirectional” frames, herein referred to as P-frames and B-frames.
Predicted frames are those frames that contain motion vector references to the preceding intraframe and/or to a preceding predicted frame, in accordance with the discussion above. If a block has changed slightly in intensity or color, then the difference between the two frames is also encoded in a predicted frame. Moreover, if something entirely new appears that does not match any previous blocks, then a new block can be stored in the predicted frame in the same way as in an intraframe.
In contrast, a bidirectional frame is used as follows. The MPEG system searches forward and backward through the video stream to match blocks. Bidirectional frames are used to record when something new appears, so that it can be matched to a block in the next full intraframe or predictive frame, with predictive frames being able to refer to both preceding and subsequent bidirectional frames. Essentially, a B-frame is an example of a frame that represents information found only in its own frame and other information by reference to both a past and a future I-frame or P-frame.
Experience has shown that two bidirectional frames between each intraframe or predictive frame works well, so that a typical group of frames associated with a single intraframe might be: the full intraframe, followed by a predictive frame, followed by two bidirectional frames, another predictive frame, two more bidirectional frames, a predictive frame, two more bidirectional frames, a predictive frame, and finally two more bidirectional frames, at which point a new full intraframe might be placed in the stream to refresh the stream.
The present invention critically recognizes that when MPEG video is played in a fast mode, i.e., either fast forward or fast rewind, only I-frames are decoded, and since transmission networks typically have limited bandwidth, a video server can send only a limited number of I-frames under the given bandwidth.
In the typical group of pictures (GOP) mentioned above which consists of fifteen total frames (I, B, B, P, B, B, P, B, B, P, B, B, P, B, B), there is one I-frame, four P-frames, and ten B-frames. A fair assumption is that a P-frame has 25% of the amount of data in an I-frame and a B-frame has 10% of the amount of data of an I-frame. If the stream rate is 24 Mbps, one I-frame uses 8 Mbps, one P-frame uses 2 Mbps, and one B-frame uses 0.8 Mbps. Thus, up to three I-frames can be sent in one GOP time period (24 Mbps/8 Mbps=3). This means 3-time (3×) fast mode is available without dropping I-pictures at maximum. For 6× fast mode, however, every other I-frame must be dropped, or else an enlarged (and thus typically unavailable) bandwidth must be provided to send all the I-frames without dropping any. The present invention critically understands that if some I-frames are dropped, fast playback will be jumpy and the user cannot easily find the point he/she wants to see. Accordingly, the present invention recognizes a need for performing smooth fast playback without dropping I-frames.