In computer architecture, a data hazard is a problem that can occur in a pipelined processor. Instructions in a pipelined processor are performed in several stages so that, at any given time, several instructions are being executed. However, the instructions may not be completed in a desired order. The data hazard occurs and causes an error when two or more of these simultaneous and possibly out-of-order instructions conflict.
The data hazards occur when data is modified. The data hazard can occur in the following situations: 1) Read after Write (RAW): An operand is modified and read soon thereafter. Because the first instruction may not have finished writing to the operand, the second instruction may use incorrect data; 2) Write after Read (WAR): Read an operand and write soon thereafter to the same operand. Because the write may have finished before the read, the read instruction may incorrectly get the new written value; and 3) Write after Write (WAW): Two instructions that write to the same operand are performed. The first one may finish after the second and therefore leave the operand with an incorrect data value. The operands involved in the data hazards can reside in a memory or in a register.
The instruction set of the pipelined processor may contain special instructions which have exceptionally high latencies relative to standard instructions. A primary example would be an instruction which fetches data from memory. The problem of the data hazards is relatively easy to avoid for low latency instructions i.e. instructions that can be completed in a small number of clock ticks, because it is relatively easy to ensure that the instructions within a particular thread are completed in the issued order. However, when high latency instructions are included in a thread, the problem of the data hazards is more significant because there is more likelihood that the instructions in the particular thread will not complete in the issued order.
These problems arise in all sorts of circumstances e.g. in 3D graphics processors, in Central Processing Units (CPUs) including dedicated media CPUs in which real time inputs are being received, and in communication with multi-processor systems.
To deal with the high latency instructions, the processor should ideally provide a mechanism to swap out a thread which is waiting for instructions to complete. However, certain requirements also have to be fulfilled.
First, in a multi-threaded processor, many threads might have potential data hazards i.e. instructions which depend upon preceding instructions that must be completed, before the instructions are processed.
Second, each thread might have a large number of long latency instructions, which could be adjacent in the stream. It must be possible to allow the return data from the long latency instructions to come back in a different order from which they were dispatched. Given that there could be a number of the long latency instructions being processed at one time, processor stalling should be reduced as much as possible due to the data hazards from the long latency instructions.
Third, it has to be possible to skip over any instructions in the thread where there is a branch in the thread, especially the instructions which might cause the data hazard, because the instructions depend upon preceding instructions that must be completed before the instructions are processed.
Fourth, it must be possible to read results in a different order than they were written.
Fifth, there shall be no penalty for multiple read accesses of destinations.
Sixth, it also must be permitted for the same destination to be written to and re-used as a destination for another long latency instruction.
Finally, it is preferable that no dedicated or mass storage is needed in processing the long latency instructions and potential data hazard instructions. It is also preferable that gate costs are kept to be a minimum.
It is an object of the invention to provide a method and an apparatus for processing threads which mitigates or overcomes the problem of the data hazards in the long latency operations.