This invention relates to a video encoder-decoder ("codec") system used for processing video data streams. Preferably the system is incorporated on a single silicon chip.
The emergence of multimedia computing is driving a need for digitally transmitting and receiving high quality motion video. The high quality motion video consists of a plurality of high resolution images, each of which requires a large amount of space in a system memory or on a data storage device. Additionally, about 30 of these high resolution images need to be processed and displayed per second in order for a viewer to experience an illusion of motion. As a transfer of large, uncompressed streams of video data is time consuming and costly, data compression is typically used to reduce the amount of data transferred per image.
In motion video, much of the image data remains constant from one frame to another frame. Therefore, video data may be compressed by first describing a reference frame and then describing subsequent frames in terms of changes from the reference frame. Standards from an organization called Motion Pictures Experts Group (MPEG) have evolved to support high quality, full motion video. A first standard (MPEG-1) has been used mainly for video coding at rates of about 1.5 megabit per second. To meet more demanding application, a second standard (MPEG-2) provides for a high quality video compression, typically at coding rates of about 3-10 megabits per second.
The codecs of this invention are used, for example, to MPEG encode video information to be recorded onto a digital video disc (DVD). DVDs are becoming popular as a lower cost, higher picture quality medium to store movies. MPEG encoding allows digital video and audio information to be placed on a DVD the size of a conventional audio CD. These 5-inch DVD discs are rapidly replacing the older 12-inch laser discs for movies in the consumer marketplace because they are smaller yet hold more information.
Efficient video processing can be achieved by overlapping processing by multiple execution units. For example, three separate execution units, a DMA execution unit, a RISC core execution unit and a video digital signal processor (DSP) execution unit all will execute the same instruction set. The RISC core execution unit and the video DSP execution unit are used to carry out the processing on the codec chip, while the DMA execution unit is used to access external memory. One instruction is fetched each clock cycle by the RISC core execution unit from an instruction cache, and that instruction is then dispatched to another appropriate execution unit according to the instruction's opcode. DSP and DMA instruction operands and results are stored in a shared memory located between the DMA and DSP execution units.
The integer data path of the RISC core execution unit has an unshared memory, typically a register file. This integer data path directly processes simple instructions such as add, branch and other RISC integer instructions. Accordingly, these instructions are typically executed at the rate they are dispatched to the execution unit, eliminating the need for an instruction queue.
The remainder of the instructions are passed either to the DMA execution unit or to the video DSP execution unit. Both of these execution units often take many computer cycles to execute an instruction. Therefore instructions for these execution units are placed into one of two instruction queues for execution by one or the other of the two execution units so that delays in one execution unit do not affect instructions dispatched to the other execution unit or to the RISC core execution unit. These queues hold pending instructions for the video DSP and for the DMA execution unit. This architecture causes delayed instructions to be executed out of order with respect to the original instruction stream fetched by the RISC core execution unit.
Prior art multiple execution unit systems using out-of-order execution and shared memory generally have synchronized instruction execution using a hardware detection system to catch read-after-write, write-after-read and write-after-write hazards which occur when a shared memory structure is used for two execution units. A read-after-write hazard occurs when an instruction tries to read a register while a previous instruction is still in the process of being completed and uses the same register as its destination. Therefore the data which will be read isn't the correct data.
A write-after-write hazard is where there is an instruction in process that completes and its result needs to be written into a register. However, there is an earlier instruction, the results of which are intended to be written to the same register, but the instruction execution isn't yet completed. If the result of the completed instruction were written into the register, then when the earlier instruction finally is completed, it will overwrite the later instruction's result. If each execution unit had its own register set, this couldn't happen. But it is not practical to use separate registers for each execution unit because data must be shared between the two execution units.
When a read-after-write hazard is detected, the execution of the later instruction must be blocked until the earlier instruction updates the shared memory. When a write-after-write hazard is detected, the later instruction's shared memory update must be blocked until the earlier instruction first updates the shared memory.
To accomplish these blocking corrections, one example of prior art hardware records the identities of: (1) all the destination registers of the outstanding instructions and (2) all the instructions that are waiting to be dispatched including the source registers for the instructions. Each time an instruction is completed by one of the execution units, the prior art hardware checks if there are any other outstanding instructions having the same destination register as the completed instruction. If not, all the instructions waiting to be dispatched are checked to see if all necessary earlier instructions which are required in order to issue the new instruction have been completed.
A compare must be carried out against all the instructions waiting to be dispatched, and the oldest instruction must be selected, which is then dispatched to the execution unit during that cycle. As instructions complete, there must be no older instructions in other execution units which write to the same destination register. Detecting the existence of such instructions is a complicated operation which must be done in parallel for all execution units.
This complicated prior art hazard handling procedure represents a timing bottleneck, as it must be done in a single clock cycle. Moreover, such hazard detection requires extensive hardware to compare the source and destination addresses as well as the ages of all outstanding instructions. Moreover, even more complicated hazard handling schemes, such as register renaming and result reorder buffering, also have been used in the prior art.