1. Field of the Invention
The present invention generally relates to computer processing, and, more specifically, to an algorithm for 64-bit address mode optimization.
2. Description of the Related Art
Developers use compilers to generate executable programs from high-level source code. Typically, a compiler is configured to receive high-level source code of a program (e.g., written in C++ or Java), determine a target hardware platform on which the program will execute (e.g., an x86 processor), and then translate the high-level source code into assembly-level code that can be executed on the target hardware platform. This configuration provides the benefit of enabling the developers to write a single high-level source code program and then target that program for execution across a variety of hardware platforms, such as mobile devices, personal computers, or servers.
In general, a compiler includes three components: a front-end, a middle-end, and a back-end. The front-end is configured to ensure that the high-level source code satisfies programming language syntax and semantics, whereupon the front-end generates a first intermediate representation (IR) of the high-level source code. The middle-end is configured to receive and optimize the first IR, which usually involves, for example, removing unreachable code, if any, included in the first IR. After optimizing the first IR, the middle-end generates a second IR for the back-end to process. In particular, the back-end receives the second IR and translates the second IR into assembly-level code.
To promote the generation of efficient assembly-level code, reducing both the number of registers referenced in the assembly-level code and the amount of address-computation code included therein is desirable. One approach used to effect these reductions is referred to herein as the “register plus offset” approach, which involves generating assembly-level instructions that reference a base address register and a constant offset. For instance, in the CUDA™ architecture, one may load a value from global memory to a register “f12” via the assembly-level instruction “Id.global.f32 f12, [rd12+64]”, where “rd12” is the base address register and “64” is the constant offset. According to this approach, multiple memory addresses are beneficially able to share the same base address register “rd12”. For example, when “rd12” stores the value “16,” the expression [rd12+64] causes the value stored in memory address [80] to be loaded into the global memory address “f12.” Similarly, when “rd12” stores the value “20,” the expression [rd12+64] causes the value stored in memory address [84] to be loaded into the global memory address “f12.” Accordingly, it is desirable for compilers to be capable of identifying high-level instructions that can be reduced to assembly-level instructions that implement the “register plus offset” approach.
Popular types of high-level instructions that can, in some cases, be reduced according to the above approach include high-level instructions that reference a 64-bit base memory address that is offset by a 32-bit expression. An example of this format is “64-bit-base-address+(uint64_t) (32-bit expression)”, where the 32-bit expression is type-converted to 64-bits (via the “uint64_t” typecast notation) so that the resultant value of the 32-bit expression is 64-bits. An example of a high-level instruction that implements the above format is “&p+(uint64_t) (−20*x+30*y+1100)”. In view of the “register plus offset” approach described above, it is desirable to determine if a constant offset can be extracted from the expression “(−20*x+30*y+1100)” in the high-level instruction. Unfortunately, the 64-bit type conversion introduces several complex issues in making such a determination, especially when the expression includes unsigned integer arithmetic. In particular, several programming language standards—such as the standards for C/C++—design unsigned computations to produce wrap-around values when overflows occur. As a result, conventional compilers are oftentimes incapable of effectively translating eligible high-level instructions into assembly-level instructions that implement the “register plus offset” approach.
Accordingly, what is needed in the art is a technique that allows constant offsets to be extracted from high-level instructions to be used in generated assembly-level instructions that implement the “register plus offset” approach.