1. Field of the Invention
This invention pertains generally to video signal processing, and more particularly to compressing video data by combining lossy and lossless compressions to minimize the compression noise while reducing the amount of video data to reduce the frame (or reference) memory in a video codec where the compression noise needs to be minimized, or for general application, such as to transmit the video data over a network or to store the video data in a storage device.
2. Description of Related Art
Video data, or digitized visual information, is widely used today. It forms a significant aspect of the modern digital revolution in information technology. It is utilized in all types of systems for the creation, distribution or communication, and consumption or use of visual information. But video data is generally voluminous. This causes severe problems for both storage and transmission.
Data compression may generally be defined as a process of transforming information from one representation to another, smaller representation from which the original data, or a close approximation thereto, can be recovered by the complementary process of data decompression. The compression and decompression processes are often referred to as coding and decoding. A coding and decoding system is generally referred to as a codec, a system having both a coder and a decoder. Codecs generally follow established standards, such as MPEG2 and H.264.
The storage and transmission of large amounts of data are often facilitated by the use of compression and decompression techniques. In particular, the transmission and storage of visual images involves large amounts of data, and benefits greatly from image compression and decompression techniques.
In a codec or compression/decompression system, an image is input to an encoder to carry out the compression of the image. The compressed image from the encoder is either transmitted or stored. The compressed image is input into a decoder to carry out the decompression of the compressed image. The decompressed image is output from the decoder, and may be sent to an output device for viewing.
Video clips are made up of sequences of individual images or “frames.” Video compression and decompression techniques process video signals to greatly reduce storage and bandwidth requirements for the compressed data while maximizing the perceived image quality of the decompressed data.
A still image is compressed by dividing an image into small pixel blocks that are transformed into a frequency domain representation, typically by a discrete cosine transform (DCT). Inverse DCT (IDCT) is used to reconstruct the original pixels from the DCT coefficients. Quantization or scaling of the DCT coefficients is used in the encoding process to retain more perceptually significant information and discard less perceptually significant information. Dequantization is the inverse process performed in the decoder.
There are many specific ways of implementing the coding and decoding processes. Since image features are usually larger than the blocks (typically 8×8 pixels) being processed, more efficient compression may use the correlation between adjacent blocks of the image. The encoder attempts to predict values of some coefficients based on values in surrounding blocks. Also, instead of quantizing and encoding the DCT coefficients directly, the differences between the actual coefficients and their predicted values may be quantized and encoded. Because the differences may be small, the number of bits required may be reduced. Color images are typically represented by using several color planes; typically one luminance (brightness) plane and two chrominance (color) planes are used. Macroblocks formed of several smaller blocks may also be used.
In video, motion between successive frames must also be taken into account. Video codecs use motion estimation and motion compensation based on similarities between consecutive video frames. Motion estimation attempts to find a region in a previously recorded frame (called a “reference frame”) closely matching each macroblock in the current frame. For each macroblock, motion estimation produces a “motion vector,” a set of horizontal and vertical offsets from the location of the macroblock in the current frame to the location of the selected matching region in the reference frame. The selected region is used as a prediction of the pixels in the current macroblock, and the difference (“prediction error”) is computed and encoded. Motion compensation in a decoder uses the motion vectors to predict pixels of each macroblock.
The reference frame is not always the previously displayed frame in a sequence of video frames. Video compression often encodes frames in a different order from the order in which they are displayed. The encoder may skip several frames ahead and encode a future video frame, then skip back and encode the next frame in the display sequence.
Video compression occasionally encodes a video frame using still-image coding techniques only, without relying on previously encoded frames. These are called “intra-frames” or “I frames.” Frames encoded using only a previously displayed reference frame are called “predictive frames” or “P frames,” and frames encoded using both future and previously displayed reference frames are called “bidirectional frames” or “B frames.” In a typical scenario, the codec encodes an I frame, skips ahead several frames and encodes a future P frame using the I frame as a reference frame, and then skips back to the next frame following the I frame. The frames between the I and P frames are encoded as B frames. Next, the encoder skips ahead several frames again, encodes another P frame using the first P frame as a reference frame, then skips back to fill the gap in the display sequence with B frames. The process continues with a new I frame inserted for every 12-15 P and B frames.
In most video codec architectures, the encoder core is implemented in separate hardware or software on a processor, and the frame memory is located outside the encoder core, typically in external memory connected through an external bus. The amount of data transfer between the encoder core and frame memory over the bus may be very large, causing high power consumption.
Accordingly it is desirable to provide a method and apparatus for reducing the amount of data transferred via external bus from an encoder to frame memory in a video codec.