With the convergence of computers, communications and media, video compression techniques have become increasingly important. Video compression is often used to translate video images (from camera, VCR, laser discs, etc.) into digitally encoded frames. The digitally encoded frames may then be easily transferred over a network, or stored in a memory. When desired, the compressed images are then decompressed for viewing on a computer monitor or other such device.
The three most common video compression standards are MPEG (Moving Pictures Experts Group), JPEG (Joint Photographic Expert Group), and H.261. These standards partition incoming video frames into small tiles and perform either spatial or temporal compression on the tiles. Each standard has a defined compression sequence for the series of incoming frames.
Compressed frames are classified as either Intra-coded (I frames), Predictive frames (P frames), or Bi-directional frames (B-frames). An `I` frame is a frame in which spatial redundancies are removed using spatial compression techniques. A `P` frame is a frame in which temporal redundancies have been removed by matching tiles through motion estimation in the current frame to a previous reference frame, then spatially compressing the temporal difference. A `B` frame is a frame in which temporal redundancies are removed by matching tiles in the current frame to a previous and a future reference frame, then compressing the difference with the spatial transform.
To perform spatial compression alone, such as in the I frame, only the individual frame is required for the compression. However, to perform the temporal compressions, which are required for both the `P` and `B`, frames, the compression of other frames must first be performed. Each P frame is encoded based on the previous I or `P` reference frame. Because `B` frames require the results of both past and future frame calculations, the processing of the B frame is an out-of order function, in which future reference frames must be analyzed prior to the intervening B frames.
Two recognized forms of video compression techniques are real-time compression and high-quality n-pass compression, where n&gt;1. Each form has known advantages. Real-time video compression uses only spatial compression techniques (I frames) to allow images to be compressed at the rate at which they are input. Thus real-time compression processes require less buffering of the input image and consequently less hardware complexity.
To provide real-time compression, a `peephole` approach is typically implemented whereby each tile in each frame is encoded as it is processed. One drawback of this scheme arises from the fact that only a fixed number of bits are allocated for encoding a frame. If bits are used to encode portions of the frame as they are received, bits may be `used up` encoding low priority components of the tile, leaving fewer bits available for encoding higher priority blocks which may appear later in the frame.
Two-pass compression alleviates the above encoding problem by processing each frame in two steps. First, each frame undergoes a Motion Estimation (ME) calculation. During the ME phase, for P and B frames, the possible motion of each macroblock in the frame is characterized relative to a past and/or future reference frame as described above. In addition, for I, P and B frames, energy statistics are generated for the frame to profile the visual complexity of the frame, providing energy statistics allow for proper allocation of bits for encoding purposes throughout the frame.
Following the ME phase, the frame undergoes Motion Compensation (MC), during which the data is actually compressed. Based on the encoding directives, a Discrete Cosine Transform (DCT) is performed on each portion (or block) of each frame, or to the temporal differences between each block and its corresponding reference point in another frame. The resulting data is then quantized and transformed into run-level tokens (RLE) tokens, which are then encoded.
Because the entire frame is evaluated before bits are allocated for encoding the different blocks of the frame, the output image provided is of much higher quality than that provided via the `peephole` compression technique described above. It would be desirable to provide two-pass compression techniques in real-time. However the complexity of the process has precluded it from being a valuable tool for video compression applications which require real-time performance. The main problem with two-pass compression techniques is encountered when analyzing B frames, as will be shown below.
Referring now to FIG. 1, a subset of a typical input stream of MPEG encoded frames, I1, B2, B3, P4, B5, B6, P7 is shown, where numbers designate the temporal order of images to encode and the I, P, and B references designate intra-coded, predictive or bi-directional frames as described above. Each P frame is encoded based on the previous I or P reference frame. Thus, to maintain real-time operation, the compression technique should be able to process P frames and I frames as they are received. Each B frame is encoded based on the previous I or P reference frame, and/or the next I or P reference frame. As seen in FIG. 1, a problem with real-time two pass processing soon develops when encoding B frames.
At time T0 frame I1 is input to ME stage of the compression engine. At time T1, frame I1 is passed to the MC stage of compression where the compression is completed. Although the ME stage is free, the B2 frame may not be input for ME processing, because the data for the next P frame has not yet been calculated. In fact, the next P frame is not even received until time T3, at which time it is input to the ME stage of the compression engine. At time T4, frame P4 is forwarded to the MC stage for compression. Only after this compression step is completed may the B2 frame be input to the ME stage of processing, at time T5. At time T6, the B3 frame may be input to the ME stage of processing, and at the B2 frame moves to the MC stage of processing. However, at time T6, the frame P7 is being input in real time, and must be processed.
If the processing of the P7 frame is extended to time T7, then the compression engine is no longer operating in real time, and thus the prior art solution is ineffective at providing a real-time two-pass compression algorithm. In order to make such a solution work, it would be necessary to accelerate processing of the ME and MC engines by a factor of 4/3 so that the ME and MC engines operate on 4/3 frames during each frame interval. However, such a solution would require more process power than is currently marketable for video compression techniques.