The present invention relates to the field of video and/or audio decompression and/or compression devices, and is more specifically directed to a decompression and/or compression device capable of decoding a bitstream encoded to comply to one of several decompression protocols and/or encoding a bitstream to comply to one of several decompression protocols.
The size of a digital representation of uncompressed video images depends on the resolution and color depth of the image. A movie composed of a sequence of uncompressed video images, and accompanying audio signals quickly becomes too large to fit entirely onto conventional recording medium, such as a compact disk (CD). Moreover, transmitting such an uncompressed movie over a communication link is prohibitively expensive because of the excessive quantity of data to be transmitted.
It is therefore advantageous to compress video and audio sequences before they are transmitted or stored. A great deal of effort is being expended to develop systems to compress these sequences. There are several coding standards currently used that are based on the DCT algorithm including MPEG-1, MPEG-2, H.261, and H.263. (MPEG is an acronym for "Motion Picture Expert Group", a committee of the International Organization for Standardization, ISO.) The MPEG-1, MPEG-2, H.261 and H.263 standards include decompression protocols that describe how an encoded (i.e. compressed) bitstream is to be decoded (i.e. decompressed). The encoding can be done in any manner, as long as the resulting bitstream complies with the standard.
Video and/or audio compression devices (hereinafter encoders) are used to encode the video and/or audio sequence before the sequence is transmitted or stored. The resulting encoded bitstream is decoded by a video and/or audio decompression device (hereinafter decoder) before the video and/or audio sequence is output. However, a bitstream can only be decoded by a decoder if it complies with the standard used by the decoder. To be able to decode the bitstream on a large number of systems, it is advantageous to encode the video and/or audio sequences according to a well accepted encoding/decoding standard. The MPEG standards are currently well accepted standards for one way communication. H.261, and H.263 are currently well accepted standards for two way communication, such as video telephony.
Once decoded, the decoded video and audio sequences can be output on an electronic system dedicated to outputting video and audio, such as a television or a video cassette recorder (VCR), or on an electronic system where image display and audio is just one feature of the system, such as a computer. A decoder needs to be added to these electronic systems to allow them to decode the compressed bitstream into uncompressed data, before it can be output. An encoder needs to be added to allow such electronic systems to compress video and/or audio sequences that are to be transmitted or stored. Both the encoder and decoder need to be added for two way communication.
FIG. 1A shows a block diagram of the architecture of a typical decoder, such as an MPEG-2 decoder 10. The decoder 10 can be both a video and audio decoder or just a video decoder, where the audio portion of the decoder 10 can be performed in any known conventional way. The encoded bitstream is received by an input buffer, typically a first-in-first-out (FIFO) buffer 30, hereinafter FIFO 30, although the buffer can be any type of memory. The FIFO 30 buffers the incoming encoded bitstream as previously received data is being decoded.
The encoded bitstream for video contains compressed frames. A frame is a data structure representing the encoded data for one displayable image in the video sequence. This data structure consists of one two-dimensional array of luminance pixels, and two two-dimensional arrays of chrominance samples, i.e., color difference samples.
The color difference samples are typically sampled at half the sampling rate of the luminance samples in both vertical and horizontal directions, producing a sampling mode of 4:2:0 (luminance:chrominance:chrominance). Although, the color difference can also be sampled at other frequencies, for example one-half the sampling rate of the luminance in the vertical direction and the same sampling rate as the luminance in the horizontal direction, producing a sampling mode of 4:2:2.
A frame is typically further subdivided into smaller subunits, such as macroblocks. A macroblock is a data structure having a 16.times.16 array of luminance samples and two 8.times.8 of adjacent chrominance samples. The macroblock contains a header portion having motion compensation information and 6 block data structures. A block is the basic unit for DCT based transform coding and is a data structure encoding an 8.times.8 sub array of pixels. A macroblock represents four luminance blocks and two chrominance blocks.
Both MPEG-1 and MPEG-2 support multiple types of coded frames: Intra (I) frames, Forward Predicted (P) frames, and Bidirectionally Predicted (B) frames. I frames contain only intrapicture coding. P and B frames may contain both intrapicture and interpicture coding. I and P frames are used as reference frames for interpicture coding.
In interpicture coding, the redundancy between two frames is eliminated as much as possible and the residual differences, i.e. interpicture prediction errors, between the two frames are transmitted, the frame being decoded and a prediction frame. Motion vectors are also transmitted in interpicture coding that uses motion compensation. The motion vectors describe how far, and in what direction a macroblock has moved compared to a prediction macroblock. Interpicture coding requires the decoder 10 to have access to the previous and/or future images, i.e. the I and/or P frames, that contain information needed to decode or encode the current image. These previous and/or future images need to be stored and then used to decode the current image.
Intrapicture coding for I frames involves the reduction of redundancy between the original pixels in the frame using block based DCT techniques, although other coding techniques can be used. For P and B frames, intrapicture coding involves using the same DCT based techniques to remove redundancy between the interpicture prediction error pixels.
The output of the FIFO 30 is coupled to a macroblock header parser 36. The header parser 36 parses the information into macroblocks, and then parses the macroblocks and sends the header portion of each macroblock to an address calculation circuit 96. The address calculation circuit 96 determines the type of prediction to be performed to determine which prediction frames the motion compensation engine 90 will need to access. Using the motion vector information, the address calculation circuit 96 also determines the address in memory 160 where the prediction frame, and the prediction macroblock within the frame, that is needed to decode the motion compensated prediction for the given macroblock to be decoded, is located.
The prediction macroblock is obtained from memory 160 and input into the half-pel filter 78, which is coupled to the address calculation circuit 96. Typically there is a DMA engine 162 in the decoder that controls all of the interfaces with the memory 160. The half-pel filter 78 performs vertical and horizontal half-pixel interpolation on the fetched prediction macroblock as dictated by the motion vectors. This obtains the prediction macroblocks.
As explained earlier, pixel blocks in I frames and prediction error pixel blocks in P or B frames are encoded using DCT based techniques. In this approach, the pixels are transformed using the DCT into DCT coefficients. These coefficients are then quantized in accordance with quantization tables. The quantized DCT coefficients are then further encoded as variable length Huffman codes to maximize efficiency, with the most frequently repeated values given the smallest codes and increasing the length of the codes as the frequency of the values decreases. Although codes other than the Huffman codes can be used depending on the decompression protocol. The coefficients are ordered in a rectangular array format, with the largest value in the top left of the array and typically decreasing in value to the right and bottom of the array. To produce a serial data bitstream the array is re-ordered. The order of the serialization of the coefficients is in a zig-zag format starting in the top right corner of the array, i.e if the array is thought of in a matrix format the order of the elements in zig-zag format is 11, 12, 21, 31, 22, 13, 14, etc., as shown in FIG. 1B. The quantization can be performed either before or after the zig-zag scan.
Referring again to FIG. 1A, the header parser 36 sends the encoded block data structures to a variable length code (VLC) decoder 42. The VLC decoder 42 decodes variable length codes representing the encoded blocks and converts them into fixed length pulse code modulation (PCM) codes. These codes represent the DCT coefficients of the encoded blocks. The PCM codes are a serial representation of the 8.times.8 block array obtained in a zig-zag format. The inverse zig-zag scanner 54, connected to the VLC decoder 42, converts the serial representation of the 8.times.8 block array obtained in a zig-zag format to a rectangular 8.times.8 block array, which is passed to the inverse quantizer 48. The inverse quantizer 48 performs the inverse quantization based on the appropriate quantization tables and the passes that to the IDCT circuit 66. The IDCT circuit 66 performs the inverse DCT on its input block and produces the decompressed 8.times.8 block. The inventors have found that these circuits can be broken down into functional blocks. In current technology the decoder is typically integrated on one or several chips without being grouped into functional blocks.
The prediction macroblock and the interpicture prediction errors are summed in the summing circuit 72 and passed to the assembly unit 102. Because in interpicture compression some frames require access to future frames to be decoded, the required frames should be sent before the frame that requires them. In the MPEG-2 standard, because frames can require both past and future frames for decompression, and therefore the compressed frames are not sent in the same order that they are displayed in the video sequence. The assembly unit 102 ensures that the information is placed in the correct place in memory to correspond to the frame being decompressed. The resulting decoded macroblock now needs to be stored in the memory 160 in the place designated for in by the assembly unit 102. All frames need to be stored in memory 160 because the decoded macroblock may not be the next macroblock that is to be sent to the display due to the storing and transmission format of the decompression protocol. In MPEG-2 and other decompression protocols that use interpicture compression, the frames are encoded based on past and future frames, therefore in order to decode the frames properly the frames are not sent in order and need to be stored until they are to be displayed. A typical MPEG-2 decoder 10 requires 16 Mbits of memory to operate in the main profile at main level mode (MP at ML). This means that the decoder requires a 2 Mbyte memory 160.
The decoder 10 can be designed to decode a bitstream formatted according to any one or a combination of standards. To decode a bitstream formatted according to a combination of standards, the decoder 10 needs to include circuitry for decoding bitstreams according to each decompression protocol. This circuitry is specific to the particular decompression protocol. The decoder 10 would also need separate encoding circuitry in order to encode a bitstream to comply to a particular decompression protocol. The decoder 10 is simply a combination of decoders, and possibly encoders, for each desired decompression protocol. For example, a decoder 10 that can decompress a bitstream encoded to comply to either the MPEG-2 standard or the H.261 standard contains two sets of decoding circuitry with each set containing its own motion compensation circuits, its own block decoding circuits, one for each of the standards and specific to that particular standard. If it is also desired that the decoder 10 be able to encode an image sequence to comply to a particular decompression protocol, separate encoding circuitry that contains circuits specific to encoding a sequence to comply to that particular decompression protocol also needs to be added.
This need for separate sets of circuitry is a problem because it greatly increases the die area of the decoder. A long time goal in the semiconductor industry has been to reduce the die area of an integrated circuit device for a given functionality. Some advantages of reducing the die area are the increase in the number of die that can be manufactured on same size silicon wafer, and the reduction in price per die resulting therefrom. This results in both an increase in volume and reduction in price of the device. Increasing the die area presents a problem because it drastically increases the cost of the device.
This is an encouragement to keep the number of decompression standards added to the device to a minimum to try and contain the increase in the die area. However, it is advantageous for the decoder 10 to be able to decode and encode sequences formatted to comply to several well accepted standards. This allows the decoder 10 to be able to decode a large number of video and/or audio sequences. Additionally, for video telephony the decoder 10 must be able to decode and encode sequences, therefore needing both a decoder and encoder.
There is now a wealth of images available, many of which comply to different standards. There is also a desire to be able to both receive transmitted or stored images, which are typically encoded to comply to the MPEG-1 or MPEG-2 standards, and to be able to communicate using video telephony, in which the images of the participants are typically encoded to comply to the H.261 or H.263 standards. This makes it advantageous to put a decoder capable of doing both into a computer, or another similar device. However, this flexibility, which is becoming more and more demanded by the consumer, is coming at the price of a much higher die area for the device and a greatly increased cost of building such a decoder.