1. Field of the Invention
The present invention pertains to video image compression. Specifically, the present invention pertains to an apparatus and method for efficiently and concurrently determining a plurality of conditions used in encoding video data in accordance with an MPEG-II protocol.
2. Description of the Prior Art
MPEG is a video signal compression standard, established by the Moving Picture Experts Group ("MPEG") of the International Standardization Organization. MPEG is a multistage algorithm that integrates a number of well known data compression techniques into a single system. These include motion-compensated predictive coding, discrete cosine transform ("DCT"), adaptive quantization, and variable length coding ("VLC"). The main objective of MPEG is to remove redundancy which normally exists in the spatial domain (within a frame of video) as well as in the temporal domain (frame-to-frame), while allowing inter-frame compression and interleaved audio. An MPEG-I decoder is specified in ISO Recommendation ITU-T H.262 (1995 E), dated January 1995. A prototype MPEG-2 encoder is specified in the ISO document "Test Model 5" Document AVC-491, Version 1, dated April, 1993, and a prototype software MPEG-II encoder is published by the MPEG Software Simulation Group. The preceding ISO publications and the prototype software MPEG-II encoder are hereby incorporated by reference.
There are two basic forms of video signals: an interlaced scan signal and a non-interlaced scan signal. An interlaced scan signal is a technique employed in television systems in which every television frame consists of two fields referred to as an odd-field and an even-field. Each field scans the entire picture from side to side and top to bottom. However, the horizontal scan lines of one (e.g., odd) field are positioned half way between the horizontal scan lines of the other (e.g., even) field. Interlaced scan signals are typically used in broadcast television ("TV") and high definition television ("HDTV"). Non-interlaced scan signals are typically used in computer systems and when compressed have data rates up to 1.8 Mb/sec for combined video and audio. The Moving Picture Experts Group has established an MPEG-I protocol intended for use in compressing/decompressing non-interlaced video signals, and an MPEG-II protocol intended for use in compressing/decompressing interlaced TV and HDTV signals
Before a conventional video signal may be compressed in accordance with either MPEG protocol it must first be digitized. The digitization process produces digital video data which specifies the intensity and color of the video image at specific locations in the video image that are referred to as pels. Each pel is associated with a coordinate positioned among an array of coordinates arranged in vertical columns and horizontal rows. Each pel's coordinate is defined by an intersection of a vertical column with a horizontal row. In converting each frame of video into a frame of digital video data, scan lines of the two interlaced fields making up a frame of un-digitized video are interdigitated in a single matrix of digital data. Interdigitization of the digital video data causes pels of a scan line from an odd-field to have odd row coordinates in the frame of digital video data. Similarly, interdigitization of the digital video data causes pels of a scan line from an even-field to have even row coordinates in the frame of digital video data.
Referring to FIG. 1, MPEG-I and MPEG-II each divides a video input signal, generally a successive occurrence of frames, into sequences or groups of frames ("GOF") 10, also referred to as a group of pictures ("GOP"). The frames in respective GOFs 10 are encoded into a specific format. Respective frames of encoded data are divided into slices 12 representing, for example, sixteen image lines 14. Each slice 12 is divided into macroblocks 16 each of which represents, for example, a 16.times.16 matrix of pels. Each macroblock 16 is divided into 6 blocks including four blocks 18 relating to luminance data and two blocks 20 relating to chrominance data. The MPEG-II protocol encodes luminance and chrominance data are encoded separately and then combines the encoded video data into a compressed video stream. The luminance blocks relate to respective 8.times.8 matrices of pels 21. Each chrominance block includes an 8.times.8 matrix of data relating to the entire 16.times.16 matrix of pels, represented by the macroblock. After the video data is encoded it is then compressed, buffered, modulated and finally transmitted to a decoder in accordance with the MPEG protocol. The MPEG protocol typically includes a plurality of layers each with respective header information. Nominally each header includes a start code, data related to the respective layer and provisions for adding header information.
There are generally three different encoding formats which may be applied to video data. Intra-frame coding produces an "I" block, designating a block of data where the encoding relies solely on information within a video frame where the macroblock of data is located. Inter-frame coding may produce either a "P" block or a "B" block. A "P" block designates a block of data where the encoding relies on a prediction based upon blocks of information found in a prior video frame. A "B" block is a block of data where the encoding relies on a prediction based upon blocks of data from surrounding video frames, i.e., a prior I or P frame and/or a subsequent P frame of video data.
Digital video data encoded according to MPEG intra-frame coding consist of 8.times.8 blocks of pels that are subjected to the DCT, producing matrices of Discrete Cosine Coefficients. The DCT is linear transform that maps digital video data associated with pels into frequency coefficients. Each coefficient represents a weighing factor for a corresponding cosine curve. The base cosine curve varies in frequency, with low frequencies describing the block's core structure and high frequencies filling in the detail. Adding the weighted basis curves together reproduces the original pels. By itself, the DCT provides no compression; however, the lack of extreme detail in most image blocks results in most high frequency coefficients being zero, or near zero. The DCT coefficients are subjected to adaptive quantization, and are then run-length and variable-length encoded. Therefore, respective blocks of intra-frame encoded digital video data may include less data than an 8.times.8 matrix of pels. In addition to the DCT coefficients, macroblocks of intra-frame encoded data include information such as the level of quantization employed, a macroblock address indicator and a macroblock type.
Blocks of data encoded according to inter-frame coding may also consist of matrices of Discrete Cosine Coefficients, and are subjected to adaptive quantization, as well as run-length and variable-length encoding. The coefficients represent, however, differences between an 8.times.8 pel block from the macroblock being encoded and an 8.times.8 pel block in a reference frame of digital video data. The predictive coding process involves generating motion vectors indicating the relationship between a macroblock being encoded and a corresponding 16.times.16 pel region of a reference frame that most closely matches the macroblock being encoded. The pel data of the matched block in the reference frame is subtracted, on a pel-by-pel basis, from the block of the frame being encoded, to develop differences. If no reasonable block matches can be found for a block/macroblock which permits inter-frame encoding, the non-matching block/macroblock is encoded according to intra-frame encoding. The differences computed for a block are processed using the DCT, with the transformed differences and the motion vectors included in the encoded data for the predicted blocks. The macroblocks of the inter-frame encoded frames also include quantization, address and type information.
Compressed video data occurs at differing rates because respective frames require different amounts of compressed data. It is desirable to transmit the compressed data at an approximately constant rate equivalent to a transmission channel's capacity to realize efficient use of the channel. Typically, rate buffers are implemented to perform variable to constant rate translation. An encoder regulates the amount of data provided to the buffers in accordance with buffer occupancy. Therefore, in compressing digital video data a data rate must be determined in addition to the encoding format and quantization level for each macroblock.
There have been prior art attempts to compress/decompress motion picture video data by taking advantage of the aforementioned principles. In U.S. Pat. No. 5,193,004 to Wang et al., an apparatus and method for coding even fields of interlace video sequences is disclosed wherein odd field data is separated from even field data before encoding occurs. The even field data is encoded and subsequently compared with the odd field data to determine if any errors occurred during the coding process.
U.S. Pat. No. 5,144,424 to Savatier discloses an apparatus and method for controlling the quantization of video data. A two step process is employed to determine the quantization level for each macroblock. First, each macroblock is compressed according to intra-frame encoding and coefficients are determined and quantized with a constant value. Then, the number of bits for each sub-block within the macroblock is determined and an average value is taken. The average value determines whether a macroblock is capable of severe quantization without a substantial loss of information.
The problem encountered in the prior art devices is that the determination of the encoding format, the bit rate and the quantization level requires numerous process steps which has previously required a large amount circuitry. What is needed is an apparatus that reduces the number of process steps and circuit complexity while determining the requisite encoding parameters for a macroblock of data corresponding to a motion picture video image.