1. Field of the Invention
The present invention pertains to the field of decoding variable length encoded information in a computer system. More particularly, this invention pertains to the field of accelerating software-based variable length decode.
2. Background of the Related Art
Full-motion video applications such as Digital Versatile Disc (DVD) playback, video conferencing, video telephony, and Digital Television (DTV) as defined by the American Television Standards Committee (ATSC) are placing greater burdens on computer system processing resources. The above-mentioned applications utilize data compression algorithms in order to reduce the amount of information that must be transmitted using today""s communication technologies. For example, audio and video information is compressed before stored on a DVD. The information is read off of the DVD and transmitted in compressed form to a decoding device which expands the information to reconstruct the original audio and video information. Popular compression algorithms include the Motion Picture Experts Group (MPEG) standard (ISO/IEC 11172), the MPEG2 standard (ISO/IEC 13818), and the Joint Photographers Experts Group (JPEG) standard (ISO/IEC 10918), among others. These compression algorithms use differential pulse code modulation (DPCM), a discrete cosine transform (DCT), and variable length encoding (VLE).
Variable length encoding is a technique wherein fixed length data are converted into variable length codewords according to the statistics of the data. In general, the codewords are chosen so that shorter codewords are used to represent the more frequently occurring data and longer codewords are chosen to represent less frequently occurring data. In assigning codewords in this fashion, the average codeword length of the variable length code is shorter than the original code, and compression is therefore achieved. VLE decreases the redundancy in the serial bitstream.
The variable length decoding (VLD) process for variable length encoded data is complicated by the variable length nature of the codewords. The decoding device has no knowledge of the length of the current codeword as it receives bits of the codeword stream. Further, the meaning and boundary of the next codeword cannot be known until the current codeword is understood. The decoding process consists primarily of a shift and compare operation. The information to be decoded is fed serially to the decoding device. One or more bits at a time are compared with stored codewords. This compare procedure is repeated until a valid codeword is found. Once the length of the current codeword is known, the decoding device can find the beginning of the next codeword.
FIG. 1 shows a typical MPEG2 decode process. A data stream is received at step 110. For this example, the data stream is from a DVD player. The DVD player outputs the data stream at a maximum rate of 1.4 MBytes per second (MBps). At step 120, the data stream is split into an audio stream and a video stream. The video stream is output from this step at a maximum rate of 1.2 MBps. The next step is a VLD step 130. The VLD process is briefly described above. The video stream exits the VDL step 130 at a maximum rate of approximately 16 MBps. Following the VLD step 130 is an inverse discrete cosine transform (IDCT) step 140, followed in turn by a motion compensation step 150. The final step in the process is step 160 where the data stream is sent to a frame buffer for display.
Prior implementations of the process discussed above in connection with FIG. 1 have been either essentially completely hardware based or implemented in software with a general purpose processor performing the various steps. The motion compensation step is sometimes accelerated through mechanisms in a graphics controller. When motion compensation is hardware accelerated by the graphics controller, the data from the IDCT step would be written into a local frame buffer, or memory location accessible by the graphics controller. The graphics controller would then do the final render into the frame buffer. The hardware based implementations have the advantage of not burdening the computer system""s processor with the decode process. The disadvantage of the hardware based implementation is the extra cost associated with providing extra hardware to perform the various decode functions. The software based implementations have the advantage of lower cost, but also have the disadvantage of utilizing a great deal of the processor""s computing resources. In many cases the processor is not able to perform the decode tasks quickly enough to provide high quality images.
A large drain on processor computing resources is the VLD step. A typical prior processor based VLD operation that is part of an MPEG2 process is shown in FIG. 2. A bit stream 210 containing variable length encoded information is read by a processor 220. The processor then must perform the shift and compare process discussed above to find the code boundaries. The processor 220 then compares the code values with a run/level table 230. The run/level table 230 contains run of zeros and level coefficients that are to be written to the 8xc3x978 block table 240 in memory. The block table 240 contains coefficients that will be used by the IDCT. The values in the block table 240 are typically 10 bit, stored in 16 bit cells. The run of zeros and the coefficients are written to the block table 240 in a zigzag fashion as represented by the arrows and cell numbering. An example value from the run/level table might be 5, 7 which indicates to the processor that a run of 5 zeros should be written to the block table beginning at the current cell (cell 1 if just starting to fill the block table) followed by a coefficient of 7 stored at the 6th cell from the current cell. For the MPEG2 standard, the run/level table value of 5, 7 corresponds to the variable length code of 010010011.
A significant contributor to the processor""s inability to efficiently perform the VLD operation is that, in general, processes are optimized to operate on byte, word, or double word aligned data. General purpose processors are generally not at peak efficiency when operating on bit aligned data.
A data structure initialization apparatus is disclosed. The initialization apparatus includes a start address storage region to receive a start address from a processor and a memory access engine coupled to the start address storage region. The memory access engine writes a predetermined pattern to a data structure located in a memory device. The memory access engine writes the predetermined pattern to the data structure without intervention from the processor. The data structure is defined by the start address and is further defined by a predetermined data structure size.