This invention relates in general to computer systems and more specifically to computer video compression and decompression techniques.
With the convergence of computers, communications and media, video compression techniques have become increasingly important. Video compression is often used to translate video images (from camera, VCR, laser discs, etc.) into digitally encoded frames. The digitally encoded frames may then be easily transferred over a network, or stored in a memory. When desired, the compressed images are then decompressed for viewing on a computer monitor or other such device.
The three most common video compression standards are MPEG (Moving Pictures Experts Group), JPEG (Joint Photographic Expert Group), and H.261. These standards partition incoming video frames into small tiles and perform either spatial or temporal compression on the tiles. Each standard has a defined compression sequence for the series of incoming frames.
Compressed frames are classified as either Intra-coded (I frames), Predictive frames (P frames), or Bi-directional frames (B-frames). An xe2x80x98Ixe2x80x99 frame is a frame in which spatial redundancies are removed using spatial compression techniques. A xe2x80x98Pxe2x80x99 frame is a frame in which temporal redundancies have been removed by matching tiles through motion estimation in the current frame to a previous reference frame, then spatially compressing the temporal difference. A xe2x80x98Bxe2x80x99 frame is a frame in which temporal redundancies are removed by matching tiles in the current frame to a previous and a future reference frame, then compressing the difference with the spatial transform.
To perform spatial compression alone, such as in the I frame, only the individual frame is required for the compression. However, to perform the temporal compressions, which are required for both the xe2x80x98Pxe2x80x99 and xe2x80x98Bxe2x80x99, frames, the compression of other frames must first be performed. Each P frame is encoded based on the previous I or xe2x80x98Pxe2x80x99 reference frame. Because xe2x80x98Bxe2x80x99 frames require the results of both past and future frame calculations, the processing of the B frame is an out-of order function, in which future reference frames must be analyzed prior to the intervening B frames.
Two recognized forms of video compression techniques are real-time compression and high-quality n-pass compression, where n greater than 1. Each form has known advantages. Real-time video compression uses only spatial compression techniques (I frames) to allow images to be compressed at the rate at which they are input. Thus real-time compression processes require less buffering of the input image and consequently less hardware complexity.
To provide real-time compression, a xe2x80x98peepholexe2x80x99 approach is typically implemented whereby each tile in each frame is encoded as it is processed. One drawback of this scheme arises from the fact that only a fixed number of bits are allocated for encoding a frame. If bits are used to encode portions of the frame as they are received, bits may be xe2x80x98used upxe2x80x99 encoding low priority components of the tile, leaving fewer bits available for encoding higher priority blocks which may appear later in the frame.
Two-pass compression alleviates the above encoding problem by processing each frame in two steps. First, each frame undergoes a Motion Estimation (ME) calculation. During the ME phase, for P and B frames, the possible motion of each macroblock in the frame is characterized relative to a past and/or future reference frame as described above. In addition, for I, P and B frames, energy statistics are generated for the frame to profile the visual complexity of the frame, providing energy statistics allow for proper allocation of bits for encoding purposes throughout the frame.
Following the ME phase, the frame undergoes Motion Compensation (MC), during which the data is actually compressed. Based on the encoding directives, a Discrete Cosine Transform (DCT) is performed on each portion (or block) of each frame, or to the temporal differences between each block and its corresponding reference point in another frame. The resulting data is then quantized and transformed into run-level tokens (RLE) tokens, which are then encoded.
Because the entire frame is evaluated before bits are allocated for encoding the different blocks of the frame, the output image provided is of much higher quality than that provided via the xe2x80x98peepholexe2x80x99 compression technique described above. It would be desirable to provide two-pass compression techniques in real-time. However the complexity of the process has precluded it from being a valuable tool for video compression applications which require real-time performance. The main problem with two-pass compression techniques is encountered when analyzing B frames, as will be shown below.
Referring now to FIG. 1, a subset of a typical input stream of MPEG encoded frames, I1, B2, B3, P4, B5, B6, P7 is shown, where numbers designate the temporal order of images to encode and the I, P, and B references designate intra-coded, predictive or bi-directional frames as described above . Each P frame is encoded based on the previous I or P reference frame. Thus, to maintain real-time operation, the compression technique should be able to process P frames and I frames as they are received. Each B frame is encoded based on the previous I or P reference frame, and/or the next I or P reference frame. As seen in FIG. 1, a problem with real-time two pass processing soon develops when encoding B frames.
At time T0 frame I1 is input to ME stage of the compression engine. At time T1, frame I1 is passed to the MC stage of compression where the compression is completed. Although the ME stage is free, the B2 frame may not be input for ME processing, because the data for the next P frame has not yet been calculated. In fact, the next P frame is not even received until time T3, at which time it is input to the ME stage of the compression engine. At time T4, frame P4 is forwarded to the MC stage for compression. Only after this compression step is completed may the B2 frame be input to the ME stage of processing, at time T5. At time T6, the B3 frame may be input to the ME stage of processing, and at the B2 frame moves to the MC stage of processing. However, at time T6, the frame P7 is being input in real time, and must be processed.
If the processing of the P7 frame is extended to time T7, then the compression engine is no longer operating in real time, and thus the prior art solution is ineffective at providing a real-time two-pass compression algorithm. In order to make such a solution work, it would be necessary to accelerate processing of the ME and MC engines by a factor of 4/3 so that the ME and MC engines operate on 4/3 frames during each frame interval. However, such a solution would require more process power than is currently marketable for video compression techniques.
According to one aspect of the invention, a method of performing real-time compression and decompression of video data input to a video compression/decompression unit is disclosed. The video data is apportioned into a plurality of frames including reference frames, where a subset of the frames are dependent frames having dependencies on one or more of the reference frames for compression purposes. The method includes the steps of compressing the reference frames as they are input, but storing the dependent frames until the associated one or more reference frames have been compressed. After the associated reference frames are compressed, the associated dependent frame is retrieved and compressed.
With such and arrangement, a high quality image may be produced because bits are more optimally allocated across pictures than they are using conventional techniques. Because the reference frame is encoded prior to the dependent frames, such as B frames, more bits are available to encode the reference frame. It is inherently more important for reference frames to be optimally encoded because the reference frame is not only displayed but is used to effect encoding of up to 4 other dependent frames. Accordingly, the present invention, by delaying the compression of dependent frames until all associated reference frames have been compressed, provides a high quality image.
According to a further aspect of the invention, the method of compressing the frames further includes the steps of determining motion characteristics for each of said frames and providing compressed data for each of the frames in response to the motion characteristics. In the present invention, the step of determining motion characteristics operates on a first frame in parallel with the step of providing compressed data operating on a second frame. With such an arrangement, the motion characteristic stage and the data compression stage each can be designed to process a frame in one frame interval. As a result, a fully pipelined operation may be provided that allows for real time two-pass video compression.
According to a further aspect of the invention, the method includes the step of storing the dependent, B frames in a memory of a coupled computer system while they await processing of their associated reference frames. Such an arrangement minimizes the storage requirements of the video compression/decompression unit itself, thereby reducing the overall cost of the system.