The primary function of most computer processors is to execute a stream of computer instructions that are retrieved from a storage device. Many processors are designed to fetch an instruction and execute that instruction before fetching the next instruction. Therefore, with these processors, there is an assurance that any register or memory value that is modified or retrieved by a given instruction will be available to instructions following it. For example, consider the following set of instructions:    1) Load memory-1→register-X;    2) Add1 register-X register-Y→register-Z;    3) Add2 register-Y register-Z→register-W.The first instruction loads the content of memory-1 into register-X. The second instruction adds the content of register-X to the content of register-Y and stores the result in register-Z. The third instruction adds the content of register-Y to the content of register-Y and stores the result in register-W. In this set of instructions, instructions 2 and 3 are considered “dependent” instructions that are dependent on instruction 1. In other words, if register-X is not loaded with valid data in instruction 1 before instructions 2 and 3 are executed, instructions 2 and 3 will generate improper results. With the traditional “fetch and execute” processors, the second instruction will not be executed until the first instruction has properly executed. For example, the second instruction may not be dispatched to the processor until a cache hit/miss signal is received as a result of the first instruction. Further, the third instruction will not be dispatched until an indication that the second instruction has properly executed has been received. Therefore, it can be seen that this short program cannot be executed in less time than T=L1+L2+L3, where L1, L2 and L3 represent the latency of the three instructions. Hence, to ultimately execute the program faster, it will be necessary to reduce the latencies of the instructions.
Therefore, there is a need for a computer processor that can schedule and execute instructions with improved speed to reduce latencies.