The present invention relates to compressed bitstream decoding. More specifically, the present invention relates to methods and apparatuses for the dynamic decoding of high bandwidth bitstreams at high decoding speeds.
Because of the advantages digital video has to offer, in the past few decades analog video technology has evolved into digital video technology. For example, digital video can be stored and distributed more cheaply than analogy video because digital video can be stored on randomly accessible media such as magnetic disc drives (hard disks) and optical disc media known as compact (CDs). In addition, once stored on a randomly accessible medium, digital video may be interactive, allowing it to be used in games, catalogs, training, education, and other applications.
One of the newest products to be based on digital video technology is the digital video disc, sometimes called xe2x80x9cdigital versatile discxe2x80x9d or simply xe2x80x9cDVD.xe2x80x9d These discs are the size of an audio CD, yet hold up to 17 billion bytes of data, 26 times the data on an audio CD. Moreover, DVD storage capacity (17 Gbytes) is much higher than CD-ROM (600 Mbytes) and can be delivered at a higher rate than CD-ROM. Therefore, DVD technology represents a tremendous improvement in video and audio quality over traditional systems such as televisions, VCRs and CD-ROM.
DVDs generally contain video data in compressed MPEG format. To decompress the video and audio signals, DVD players use decoding hardware to decode the incoming bitstream. FIG. 1 is a block diagram showing a prior art digital video system 100. The digital video system 100 includes a digital source 102, a digital processor 104, and a digital output 106. The digital source 104 includes DVD drives and other digital source providers, such as an Internet streaming video connection. The digital processor 104 is typically an application specific integrated circuit (ASIC), while the digital output 106 generally includes display devices such as television sets and monitors, and also audio devices such as speakers.
Referring next to prior art FIG. 2, a conventional digital processor 104 is shown. The digital processor 104 includes a decompression engine 200, a controller 202, and DRAM 204. Essentially, the bitstream is decompressed by the decompression engine 200, which utilizes the DRAM 204 and the controller 202 during the decompression process. The decompressed data is then sent to a display controller 206, which displays decompressed images on a display device, such as a television or monitor.
As stated previously, digital processors are generally embodied on ASICs. These ASICs typically map key functional operations such as variable length decoding (VLD), run-length decoding (RLD), Zig Zag Scan, inverse quantization (IQ), inverse discrete cosine transformation (IDCT), motion compensation (MC), and merge and store (MS) to dedicated hardware. To gain processing speeds, techniques such as pipeline implementation of these modules are used to execute computations with available cycle time.
Generally, an MPEG bitstream is provided to a DRAM i/f by a memory controller and thereafter made available to the VLD, RLD/IZZ, IQ, and IDCT for data reconstruction. Simultaneously, the MC executes if motion compensation exist for the current data. When the MC and IDCT finished their operations, the output data from each module is added together by the MS module, the result being the reconstructed data. Finally, the MS stores the reconstructed data in DRAM.
Unlike the execution times of the VLD, RLD/IZZ, IQ, and IDCT modules, which are fixed, the execution time of the MS module is variable. Hence, to avoid memory access conflicts, the MS module in a conventional decompression engine must wait for the IDCT and MC modules to finish processing for the current macroblock. Thereafter, the other modules in the decompression engine must wait for the MS to finish processing in order to ensure that the IDCT memory is free. Only after the MS is finished processing is the next macroblock begun.
FIG. 3 is an implementation timing diagram 300 illustrating module execution timing for a conventional decompress engine. The implementation timing diagram 300 includes a VLD/IQ/IZZ operational period 302a, an IDCT operational period 304a, an IDCT memory store operational period 306a, and an MS operational period 308a. 
In operation, the VLD/IQ/IZZ 302a modules are started, at time to. After certain cycles the VLD/IQ/IZZ 302a generates a data block and stores it in a double buffer. Then, at time t1 the IDCT 304a reads the data block from the double buffer and generates a data block which is stored in the IDCT memory buffer 306a, at time t2. Then, at time t3, after all the IDCT data has been written to the IDCT memory buffer 306a, the MS 308a begins reading the IDCT memory buffer. During this time the MS 308 reads both the IDCT and MC data, adds them together, and stores the result the DRAM. Finally, after the MS 308a is finished reading all the data and storing it in the DRAM, the process is started again, at the next macroblock M2.
The critical issue is the relationship between the IDCT and the MS. The IDCT uses a coded block pattern (CBP) for memory storage. Thus, the configuration of the data in memory is unknown until the bitstream is decoded. The MS, on the other hand, reads relative data sequentially. Hence, conflicts may occur if the MS and IDCT share the IDCT memory at the same time, since the IDCT may over write data that the MS is attempting to read.
To avoid these conflicts, conventional decoders delay the start of the next VLD/IQ/IZZ 302b operational period until after the MS operational period 308a is completed. In this manner, a buffer of time xcex94t is created between the time the MS 308a is finished reading the IDCT memory, and the time the IDCT writes to the IDCT memory 306a. This buffer ensures no memory conflicts will occur in the IDCT memory during a conventional decoding process. Thus, if t0 to t1 is one block time latency and t1 to t2 is one block time latency, then xcex94t (xcex94t=t2-t0) is a two block latency.
However, the time used to create the buffer xcex94t is wasted since the MS and IDCT memory are idle during this period. Ideally, the IDCT memory would be active during this time receiving data from the IDCT. However, since the MS must access the DRAM, the MS operational period 308a is uncertain, as shown in FIG. 3 with reference to MS operational period 308a, and MS operational period 308b. Thus, prior art decompression engines generally must use a time buffer xcex94t to avoid memory conflicts between the IDCT and MS.
In view of the foregoing, what is needed are improved methods and apparatuses for decoding an incoming bitstream that increase bandwidth of the system. The system should be robust and capable of operating without read/write time buffers that reduce bandwidth.
The present invention addresses these needs by providing a method for halting the decoding process during potential memory access conflicts between the IDCT and the MS. First, a portion of an incoming bitstream is decoded. During this operation uncompressed video data is generated by various decoding modules. Then, a determination is made as to whether a memory operation is complete that stores the uncompressed video data to memory. The decoding of the incoming bitstream is halted during a specific time period, which includes a time period wherein the memory operation is incomplete. Finally, the decoding of the incoming bitstream is restarted when the memory operation is complete.
In another embodiment a decoder is disclosed that provides dynamic pipelining of an incoming compressed bitstream. The decoder includes decoding logic modules capable of decoding an incoming compressed bitstream, and memory storing logic in communication with at least one of the decoding modules. Preferably, the memory storing logic is capable of determining whether a memory operation is complete that stores the uncompressed video data to memory. In addition, the decoder includes halting logic in communication with the decoding logic and the memory storing logic. The halting logic halts the decoding of the incoming bitstream during a specific time period, which includes a time period wherein the memory operation is incomplete. Finally, initiating logic is included in the decoder that is in communication with the decoding logic and the memory storing logic. The initiating logic of the decoder restarts the decoding when the memory operation is complete.
In a further embodiment, an application specific integrated circuit (ASIC) that includes a decoder that provides dynamic pipelining of an incoming compressed bitstream is disclosed. The ASIC includes a memory controller, decoding logic modules that are capable of decoding an incoming compressed bitstream, and memory storing logic in communication with at least one of the decoding modules, capable of determining whether a memory operation is complete. Preferably, the memory operation includes storing the uncompressed video data to memory. In addition, halting logic is included in the ASIC. The halting logic is generally in communication with the decoding logic and the memory storing logic, and is capable of halting the decoding of the incoming bitstream during a specific time period, which includes a time period wherein the memory operation is incomplete. Finally, the ASIC includes initiating logic in communication with the decoding logic and the memory storing logic, that is capable of restarting the decoding of the incoming bitstream when the memory operation is complete.
Advantageously, the present invention allows for greater efficiency in decoding by providing a mechanism that allows for synchronization of memory writes by different decoding modules. By halting decoding and memory write operations when write operation conflicts occur, the present invention avoids the need of large buffers of time, used conventionally to prevent memory write conflicts.