Conventionally, a software developer designs programs that are compiled and executed by a processor. A program consists of a number of instructions written in a sequential order. The programming model is typically viewed as the instructions being executed in the same order that the instructions are ordered in the program. However, various computer architectures may enable certain program instructions to be executed out-of-order. For example, a first instruction may specify an arithmetic operation such as C=A+B and a second instruction, sequentially ordered after the first instruction, may specify an arithmetic operation such as E=D*D. The compiler may analyze the program and reorder the first and second instructions such that the second instruction is executed before the first instruction. The reordering of instructions enables certain efficiencies to be realized such as by moving long-latency operations closer to the beginning of the program.
The instructions included in the above example can be reordered because there are no dependencies between the instructions. In other words, the second instruction does not include an operand that is affected by the execution of the first instruction. However, other instructions may have dependencies that do not allow such reordering of the instructions. For example, a first instruction may load a value from an address in RAM (Random-Access Memory) into a register, and a second instruction may perform an arithmetic operation that utilizes the value in the register as an operand of the arithmetic operation. In this case, the second instruction cannot be executed before the first instruction because the second instruction has a dependency on the first instruction.
In addition, some program instructions may be associated with a long latency in order to generate a result. In other words, the processor may require a large number of clock cycles to execute the instruction. For example, while accessing a value in a memory sub-system, the processor may wait for hundreds or thousands of clock cycles for the memory sub-system to return the value stored at a particular memory address. During operations associated with a long-latency, the registers associated with the operations may be reserved for a large number of clock cycles even though the registers do not store useful data for a large majority of those clock cycles.
One technique to hide latency enables other operations to be executed while one or more long-latency operations are in flight. In order to allow for additional operations to be executed while long-latency operations are waiting to be completed, the compiler may use different registers in the register the to perform the additional operations. The size of a register file associated with an execution thread may be increased to enable other operations to be executed substantially in parallel with the long-latency operation. However, the amount of latency that can be hidden using this technique is bounded by the number of available registers and the number of independent instructions in the program that can be executed out-of-order. It will be appreciated that increasing the size of the register file takes up a vital resource (i.e., surface area of the silicon chip) that increases the cost of the design. Furthermore, increasing the size of the register file also requires more power to access data stored in the register file, making the processor less power efficient. Consequently, increasing the register file is an expensive solution to this problem. Thus, there is a need for managing the execution order of program instructions that addresses this issue and/or other issues associated with the prior art.