1. Field of the Invention
The present invention relates to compilers for computer systems. More particularly, the present invention relates to a method and an apparatus that allows a programmer to specify constraints on memory references that the programmer has followed in writing code so that a compiler is able to more effectively disambiguate memory references in the code.
2. Related Art
Compilers perform many optimizations during the process of translating computer programs from human-readable source code form into machine-readable executable code form. Some of these optimizations improve the performance of a computer program by reorganizing instructions within the computer program so that the instructions execute more efficiently. For example, it is often advantageous to initiate a read operation in advance of where the data returned by the read operation is used in the program so that other instructions can be executed while the read operation is taking place.
Unfortunately, the problem of xe2x80x9caliasingxe2x80x9d greatly restricts the freedom of a compiler to reorganize instructions to improve the performance of a computer program. The problem of aliasing arises when two memory references can potentially access the same location in memory. If this is the case, one of the memory references must be completed before the other memory reference takes place in order to ensure that the program executes correctly. For example, an instruction that writes a new value into a memory location cannot be moved so that it occurs before a preceding instruction that reads from the memory location without changing the value that is read from the memory location.
The problem of aliasing is particularly acute for programs that make extensive use of memory references through pointers, because pointers can be easily modified during program execution to point to other memory locations. Hence, an optimizer must typically assume that a pointer can reference any memory location. This assumption greatly limits the performance improvements that can the achieved by a code optimizer.
One solution to this problem is to use a strongly typed computer programming language, such as Pascal, that restricts the way in which pointers can be manipulated. For example, in a strongly typed language, a pointer to a floating point number cannot be modified to point to an integer. Hence, an optimizer is able to assume that pointers to floating pointer numbers cannot be modified to point to integers, and vice versa. The drawback of using strongly typed languages is that strong type restrictions can greatly reduce the freedom of the programmer.
An alternative solution is to construct a code optimizer that detects all of the aliasing conditions that can arise during program execution. Unfortunately, the task of detecting all of the aliasing conditions that can arise is computationally intractable and/or undecidable for all but the most trivial computer programs.
Another solution is to use programming standards. The C programming language standard imposes type-based restrictions on the way pointers may be used in standard-conforming programs. Unfortunately, these programming standards are flagrantly ignored in programs of enormous economic importance, such as major database applications. Consequently, compilers do not use the restrictions imposed by programming standards to achieve better performance.
What is needed is a method and an apparatus that allows a compiler to selectively use restrictions on the way pointers are used in a program to more effectively detect aliasing problems in a computer program. (Note that the process of determining whether two memory references alias is known as alias xe2x80x9cdisambiguation.xe2x80x9d)
One embodiment of the present invention provides system that allows a programmer to specify a set of constraints that the programmer has adhered to in writing code so that a compiler is able to assume the set of constraints in disambiguating memory references within the code. The system operates by receiving an identifier for a set of constraints on memory references that the programmer has adhered to in writing the code. The system uses the identifier to select a disambiguation technique from a set of disambiguation techniques. Note that each disambiguation technique is associated with a different set of constraints on memory references. The system uses the selected disambiguation technique to identify memory references within the code that can alias with each other.
In one embodiment of the present invention, the system initially receives the code in source code form, and processes the code into an intermediate form.
In one embodiment of the present invention, the system optimizes the code based upon the identified memory references to produce executable code.
In one embodiment of the present invention, the system allows the programmer to identify the set of constraints adhered to for each variable in the code.
In one embodiment of the present invention, the disambiguation technique operates by presuming any two memory references alias.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory reference and a second memory reference alias if either reference is to a character type and an originating reference for the character type is a de-reference.
In one embodiment of the present invention, the disambiguation technique operates by presuming any two memory references alias unless they are both one of, a basic type, an enumerated type and a pointer type.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory reference and a second memory reference alias if there is an overlapping match between common initial portions of a first type tree for the first memory reference and a second type tree for the second memory reference.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory reference and a second memory reference alias if there is an overlapping match between a first type tree for the first memory reference and a second type tree for the second memory reference. In a variation on this embodiment, a root node of the first type tree corresponds to a root node of the second type tree within the overlapping match.
In one embodiment of the present invention, determining if there is an overlapping match involves ensuring that for marked nodes in a type tree that are descendants of a union field, corresponding nodes for associated union fields are marked. In a variation on this embodiment, the corresponding nodes for associated union fields are determined by considering a common initial portion of the union field with respect to other union fields.
In one embodiment of the present invention, the system indicates an error condition if pointers of different structure types are cast to each other.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory and a second memory reference alias if: the first memory reference is directed to a structure element of the same basic type as the second memory reference; and the first memory reference and the second memory reference have the same structure offset.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory and a second memory reference having the same structure offset alias only if explicit program instructions specify that the first memory reference and the second memory reference alias.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory into a first structure and a second memory reference into a second structure alias if: the first structure and the second structure include the same basic types in the same order; and the first memory reference and the second memory reference have the same structure offset.
In one embodiment of the present invention, the disambiguation technique operates by presuming that a first memory and a second memory reference alias if the first memory reference and the second memory reference have the same structure offset.
In one embodiment of the present invention, the disambiguation technique operates by using a tree-based matching scheme for alias analysis.
In one embodiment of the present invention, the disambiguation technique operates by using a de-referenced type and an accessed type in a type-based alias analysis.
In one embodiment of the present invention, if a first memory reference is associated with a first disambiguation technique and a second memory reference is associated with a second disambiguation technique, the system uses both the first disambiguation technique and the second disambiguation technique to determine whether the first memory reference and the second memory reference alias.
Hence, the present invention provides the user with a mechanism to express type-based information about the way pointers are used in a program. This information allows a compiler to do a significantly better job of alias disambiguation for pointer-based memory references in the program.
One embodiment of the present invention provides a plurality of alias levels, wherein each level specifies a certain set of properties about the way pointers are used in a program. These levels vary in xe2x80x9cstrength.xe2x80x9d xe2x80x9cWeakerxe2x80x9d levels give the programmer more freedom, but result in lower run-time performance under optimization. xe2x80x9cStrongerxe2x80x9d levels give the programmer less freedom, but result in higher run-time performance under optimization.