1. Field of the Invention
The present invention relates to the field of storage and transmission of compressed video information. More particularly, the present invention relates to providing playback features such as fast forward and reverse playback during decompression of encoded video programs.
2. Background
Applications involving video transmission or storage require some form of data compression to reduce the otherwise tremendous volume of information required for video data. The International Organization for Standardization (ISO) Motion Picture Experts Group (MPEG) has developed a standard for compressing video data to manageable or useful volumes while still preserving enough "information" to be useful for various storage or transmission applications. These applications for storage or transmission on various digital media include compact disc, remote video data bases, movies on demand, digital cable television, and high definition television. MPEG is documented in ISO/IEC publications 11172 ("Coding of Moving Pictures and Associated Audio for Digital Storage Media") and 13818 ("Generalized Coding of Motion Pictures and Associated Audio Information"), also known as MPEG-1 and MPEG-2, respectively. As used hereafter, "MPEG" will be understood to refer to either MPEG-1 or MPEG-2 without distinction therebetween.
The MPEG standard recognizes that much of the information in a picture within a video sequence is similar to information in a previous or subsequent picture. The MPEG standard takes advantage of this temporal redundancy to represent some pictures in terms of their differences from one or more pictures. A picture consists of a number of horizontal slices; a slice consists of a number of macroblocks; a macroblock consists of an array of blocks; and a block consists of a 8.times.8 array of pixels.
The video part of the MPEG standard uses motion compensated predictive coding, the discrete cosine transform (DCT), adaptive quantization, variable-length encoding, and run-length encoding to compress images on a block-by-block basis. Motion compensation replaces a macroblock with a motion vector representing its gross displacement from a corresponding macroblock in the reference picture, plus error terms for each of the pixels in the macroblock. MPEG uses both forward motion compensation (in which a future picture referenced to a past picture), and a combination of forward and backward motion compensation (in which a picture is referenced to a past picture). The combined forward and backward motion compensation is called bi-directional motion compensation.
According to the MPEG standard, video frames (pictures) are classified into one of three types: I-frames, also called I-pictures or intraframe coded pictures; predicted pictures, also called P-frames or P-pictures; and B-frames or B-pictures, also called bi-directionally coded pictures. P-frames and B-frames are also collectively referred to as interframe coded images. The three types of video frames differ in their use of motion compensation.
Intra pictures (I-frames or I-pictures) are coded using only information present in the picture itself. They can be thought of as being independent pictures. I-pictures provide random access points into the compressed video data. I-pictures use only transform coding and therefore provide only moderate compression. An I-frame provides enough information for a complete picture to be generated from the I-frame alone.
Predicted pictures (P-pictures or P-frames) are coded from a previous I-picture or previous P-picture as a reference. They can be thought of as dependent pictures. The compression of P-pictures uses motion-compensated temporal prediction of some or all macroblocks in the P-picture relative to corresponding macroblocks from the previous I- or P-picture. Only forward motion estimation/compensation is used in this temporal prediction. The I- or P-picture from which a P-picture is temporally predicted is called the anchor picture to the P-picture and is sometimes referred to as the reference picture or reference frame. Predicted pictures provide more compression than I-pictures because only the difference from a previous picture is encoded. One drawback of using P-pictures as anchors for subsequent P-pictures is that coding errors may be propagated through the subsequent prediction of P-pictures.
Bi-directional pictures (B-pictures or B-frames) are pictures that use both a past and future pictures as references. Like P-pictures they can be thought of as dependent pictures. Some or all macroblocks in B-pictures are coded by a bi-directional motion-compensated predictive encoder using corresponding macroblocks from a "future" I- or P-picture for backwards prediction and from a previous I- or P-picture for forward prediction. The two reference I- or P-pictures from which a B-picture is temporally predicted are thus called the anchor pictures of the B-picture. Like P-pictures, B-pictures only encode the temporal differences between the B-picture and its two anchor pictures. Bi-directional pictures provide the most compression and do not propagate errors because they are never used as a reference. Bi-directional prediction also decreases the effects of noise by averaging two pictures.
In accordance with the MPEG standard, pictures are arranged in ordered groups. The MPEG standard allows the encoder to choose the frequency and location of I-pictures. As an example, a single group might include an I-picture as the first picture in the group with P-pictures distributed following every third picture and B-pictures between each "I and P" and "P and P" sequence. A typical display order of picture types might include an I-picture every fifteenth frame, each I-picture followed by two B-pictures with P-pictures between each group of B-pictures in a sequence something like I B B P B B P B B P B B P B B I. Including an I-picture every fifteenth frame corresponds to (in a frame per second environment), having a complete picture representation (an independent picture) every one half-second.
In some MPEG systems, the MPEG encoder will reorder the pictures in the video stream to present the pictures to the decoder in the most efficient sequence. In particular, the reference pictures needed to reconstruct B-pictures may be sent before the associated B-pictures.
A number of well-known references, e.g. Mattison, "Practical Digital Video", Wiley, 1994 may be referenced for details about various actual mechanisms for encoding the video data in accordance with the MPEG standard. For purposes of the present application, it is important to understand the distinction between I-, P- and B- pictures. Specifically, it is important to recognize that only I-pictures (independent or reference pictures) provide enough information to reconstruct a complete picture in a video sequence without reference to other pictures.
Existing MPEG decoders are concerned with the reconstruction and display of encoded video information. However, for users viewing the decoded information, it is often desirable to view the information in a mode other than normal speed forward playback. Such alternative modes include being able to pause, or freeze, a current image that is being displayed. Likewise, it is often desirable to provide a slow motion playback in both the forward and reverse directions as well as fast forward and high-speed reverse functionality.
To implement a pause function, MPEG decoders generally provide some mechanism for freezing the current image that is being displayed, thereby temporarily halting the decompression process. Decoders also generally include an input buffer in order to provide a certain level of decoupling between the timing of the decoding process and the timing of the data delivery system which would typically consist of a storage device and a storage controller. Therefore, when the decoding process is halted, the amount of data that is stored in the buffer begins to increase. In some implementations, a feedback mechanism responsive to the depth level of the input buffer is provided to the storage controller, causing it to halt the data transfer whenever necessary to prevent the buffer from overflowing.
Like the pause function, slow motion playback in the forward direction can be achieved simply by sending one or more instructions to the decoder. These instructions cause the decoder to repeat each or some frames one or more times. As before, the amount of data accumulating in the decoder's input buffer will increase during slow motion playback due to the reduced output rate. This can be compensated for by a feedback mechanism similar to the one described above.
In order to implement the fast forward function, some frames must be discarded, either by the decoder or the preceding data delivery system. This is because the output display rate is generally limited by the decoding and/or display apparatus (e.g., 30 frames per second on a standard television video display). An increase in the rate of playback can be realized by deleting the B-frames, should any exist. For example, if two of every three frames is a B-frame, then eliminating B-frames results in a three-fold increase in the rate of playback. Alternatively, the playback rate can be increased by fifty percent by first deleting all of the B-frames and then instructing the decoder to repeat each remaining frame one time. Since the B-frames are not needed for reconstruction of the remaining I- and P-frames, their deletion would not compromise the accuracy of the remaining images. Higher playback rates can be achieved by deleting not only the B-frames, but the P-frames as well. This would leave only the I-frames which can always be reconstructed without referencing any other images. For example, if every fifteenth frame is an I-frame, then the rate of playback could be increased by a factor of fifteen simply by deleting all other frames. In practice, such an increase may be realized only if the data delivery system is capable of retrieving and presenting the data to the decoder fifteen times faster than the rate required for normal playback. Otherwise, if the data delivery hardware is not fast enough, the decoder's input buffer may underflow, forcing the decoder to freeze a current image until more data becomes available.
The demands placed on the data delivery hardware can be even more severe during reverse playback. In a practical implementation of reverse playback, only the I-frames are useful. This is because the P- and B-frames cannot be reconstructed without using previously decoded frames for prediction. Unfortunately, the previous frames referred to during forward playback become future frames during reverse playback. Theoretically, these prerequisite frames could be reconstructed in advance and then stored in memory, but this would significantly increase the cost of the playback system. Therefore, a preferred solution is to retrieve and display only the I-frames. Various playback rates can still be achieved by repeating these I-frames one or more times. A more difficult problem, however, is to attain high reverse playback rates without having to repeat each frame a multiple number of times while waiting for additional data to become available. Such multiple repetitions can seriously degrade motion rendition.
One of the difficulties associated with multi-speed playback of compressed bit streams is the problem of transitioning from one playback mode to another. For example, during forward playback at high speed or reverse playback at any speed, generally, only the I-frames are selected by the storage controller and provided to the display system's decoder. When transitioning from one of the modes to forward playback at normal speed, the sequence in which frames are selected by the controller and presented to the decoder is altered. In this particular case, the controller would stop deleting P-frames and B-frames from the compressed bit stream and instead would pass all types of frames to the decoder. Such a transition may cause artifacts to appear and remain visible during the entire transition. For example, if the first frame encountered after the controller begins to accept all types of frames is a P-frame, then the decoder must reference a preceding I- or P-frame when forming the prediction required for reconstruction. However, the decoder would only be able to access the last I-frame that was received prior to the transition to normal playback, and if this is not identical to the preceding frame that was used during the original encoding process, then an artifact will occur. Similarly, if the first frame encountered after the transition is a B-frame then artifacts are almost certain to occur since two prerequisite frames would be required to form the prediction, and at least one of these prerequisite frames is likely to be a P-frame, assuming typical encoding parameters.
From the foregoing it can be appreciated that it is desirable, and is therefore an object of the present invention, to prevent transition artifacts when changing playback modes in a multi-speed playback compressed video system. Further, it would be desirable to have a mechanism for efficient data access to support multi-speed playback in a compressed video system.