With the convergence of computers, communications and media, video compression techniques have become increasingly important. Video compression is typically used to translate video images (from camera, VCR, laser discs, etc.) into digitally encoded frames. The digitally encoded frames may then be easily transferred over a network, or stored in a memory. When desired, the compressed images may then be decompressed for viewing on a computer monitor or other such device.
The three most common video compression standards are MPEG (Moving Picture Experts Group), Motion-JPEG (Joint Photographic Experts Group), and H.261. These standards partition incoming frames into small tiles and perform either spatial or temporal compression on the tiles. Spatial compression involves removing redundant data in the horizontal and vertical picture dimensions; i.e. data within a frame that is similar in picture areas which are close to each other. Temporal compression involves removing redundant data occurring over a given time; i.e. data that repeats from frame to frame. The amount of each type of compression which may be performed on each frame depends on several factors: frame type (discussed below), image classification (e.g., smooth, texture, edge), and resources (number of available resultant bits). Each standard has a defined order of incoming frames.
Encoded frames are classified as either Intra-coded (I-frames), Predictive frames (P-frames), or Bi-directional frames (B-frames). An `I` frame is an frame in which spatial redundancies are removed using spatial compression techniques. A `P` frame is a frame in which temporal redundancies have been removed by matching tiles through motion estimation in the current frame to a previous reference frame, then spatially compressing the temporal difference coefficients. A `B` frame is a frame in which temporal redundancies are removed by matching tiles in the current frame to a previous and a future reference frame, then compressing the difference coefficients with the spatial transform.
To perform spatial compression alone, such as in the `I` frame, only the individual frame is required for the compression. However, to perform the temporal compressions, which are required for both the `P` and `B` frames, the compression of other frames must first be performed. Each `P` frame is encoded based on the previous `I` or `P` reference frame. Encoding of `B` frames require the results of both past and future frame calculations, thus the processing of the B frame is an out-of order function, in which future reference frames must be analyzed prior to the intervening B frames.
Two recognized forms of video compression techniques are real-time compression and high-quality n-pass compression, where n&gt;1. Each form has known advantages. Real-time video compression only spatial compression techniques (I frames) to allow images to be compressed at the rate at which they are input. Thus real-time compression processes require less buffering of the input image and consequently less hardware complexity.
To provide real-time compression, a `peephole` approach is typically implemented whereby each tile in each frame is encoded as it is processed. One drawback of this scheme arises from the fact that only a fixed number of bits are allocated for encoding a frame. If bits are used to encode portions of the frame as they are received, bits may be `used up` encoding low priority components of the tile, leaving fewer bits available for encoding higher priority blocks which may appear later in the frame.
Two-pass compression alleviates the above encoding problem by processing each frame in two steps. First, each frame undergoes a Motion Estimation (ME) calculation. During the ME phase, for P and B frames, the possible motion of each macroblock in the frame is characterized relative to a past and/or future reference frame as described above. In addition, for I, P and B frames, energy statistics are generated for the frame to profile the visual complexity of the frame. Providing energy statistics allows for allocation of bits for encoding purposes throughout the frame.
Compression of video images includes quantizing the energy coefficients such that a large range of data may be represented by a smaller discrete number of values. Judicious choice of quantization values is critical to achieving a balance between achieved compression and image quality after decompression. Assigning quantization values to every macroblock of an image may result in poor image quality since a single formula can not take into account the relative importance of particular macroblocks to the human visual system. It would be advantageous therefore to provide a method which assigns a quantization value during compression which affords the highest level of compression while taking into consideration the relative importance of the macroblocks within an image.