Field of the Invention
The present invention generally relates to computer processing and, more specifically, to techniques for determining instruction dependencies.
Description of the Related Art
In conventional computer processing systems, to execute a software application within a particular processing device, such as a central processing unit (CPU) or a graphics processing unit (GPU), a compiler first translates an associated software application text file into an optimized sequence of machine instructions. Typically, the software application text file is written in a general purpose programming language (e.g., C++). And the machine instructions are targeted to a specific process architecture associated with the selected processing device.
As part of the process of optimizing instructions, the compiler conducts instruction scheduling. The purpose of instruction scheduling is to schedule the instructions in a more efficient order while preserving the semantics of the software application. In instruction scheduling, the compiler first determines how the instructions interact in the initial ordering. In particular, the compiler evaluates how the instructions access various memory resources (i.e., any element that holds state that an instruction may read or write). For example, consider the following initial sequence of instructions (where a, b, c, d, e, f, and g correspond to different memory resources):
a=b+c; //instruction 1
d=b+e; //instruction 2
f=a+g; //instruction 3
To optimize the ordering of these three instructions, a typical compiler would first construct a dependency graph in which instructions that could access a common memory resource were linked. For the example shown, such a dependency graph would capture that instruction 1 and instruction 2 would both read but would not write the same memory resource (i.e., “b”); instruction 1 would write the same memory resource (i.e., “a”) that instruction 3 would, subsequently, read; and instruction 2 and instruction 3 would neither read nor write the same memory resources. Using this dependency graph, the compiler would evaluate various sequences of the three instructions to determine which sequence best optimized the overall execution efficiency of the software application, while preserving the results that would be obtained the instructions been executed in the original order (i.e., a “valid” reordering). Referring again to the above example, the compiler would not have the freedom to reorder the instructions such that instruction 3 occurred before instruction 1, because that reordering could change the results. By contrast, the compiler would have the freedom to reorder the instructions such that instruction 2 preceded instructions 1 since that reordering would not change the results.
Compilers typically treat a memory resource as a single entity. This representation is usually adequate for explicit hardware such as a single register. However, as the complexity and specialization of hardware architectures have increased, the concept of a memory resource has evolved. Increasingly, a memory resource may be artificial, used for convenient modeling of the architecture specific parts of the compiler. For example, the architecture may define register sets as memory resources. Each register set may include any number of different, mostly implicit, register banks, where each register bank may include any number of registers, and each register may include any number of bits.
Further, instructions in such architectures may access only one or more scattered subsets within a defined memory resource. For example, referring back to the above example, if memory resource “a” represents a register that includes 256 bits, RA[0:255], then instruction 1 may actually access only four scattered bits within RA: {RA[5], RA[56], RA[121], RA[255] }, Further, instruction 3 may actually access only two scattered bits within RA: {RA[50], RA[97] }.
Compilers may be configured to perform a dependency analysis in different ways. In one approach, the compiler conducts the dependency analysis of instructions conservatively. More specifically, the compiler considers each instruction to affect the entirety of each memory resource associated with the instruction. Referring back again to the above example, the compiler would consider instruction 1 to potentially write all 256 bits included in RA and instruction 3 to potentially read all 256 bits included in RA. Therefore, the compiler would not have the freedom to reschedule instruction 3 to precede instruction 1, even though such a reordering would not change the results (i.e., such a reordering would be valid). Thus, one drawback to this approach is that the compiler is unable to consider all valid reorderings and, therefore, may not be able to determine the optimal reordering. Consequently, the speed at which the processor executes the software application may not be fully optimized.
In an alternate approach, the compiler splits each memory resource into separate memory resources, each of which represents a single element (e.g., bit, register, etc.) included in the initial memory resource. Referring back again to the above example, the compiler would split RA into 256 separate bits before constructing the dependency graph. Unfortunately, constructing a dependency graph involves comparing each instruction with each of the other instructions to determine access to common memory resources. This evaluation is typically implemented using an N-square algorithm. And, as persons skilled in the art will understand, as the problem size increases (e.g., the number of objects increases or the number memory resources increases), the performance of N-square algorithms quickly degrades. Thus, although this second approach may reduce the conservatism of the first approach, the subsequent dependency analysis may result in an unacceptable increase in the time required to compile the code.
As the foregoing illustrates, what is needed in the art is a more effective approach for determining memory resource dependencies between instructions when compiling software applications.