1. Field of the Invention
This invention pertains generally to software compilation and optimization. More particularly this invention is directed to a system land method for using a unique graphical representation of programs which enables new optimizations in a link-time or xe2x80x9cpost compilationxe2x80x9d optimizer.
2. The Prior Art
Current computer systems include, among other things, a memory system and a processing unit (or processor or central processing unit (CPU)). A memory system serves as a repository of information, while the CPU accesses information from the memory system, operates on it, and stores it back.
Complexity exists within the CPU as well as at the CPU/memory system interface. Looking at the CPU/memory interface, it is well known that CPU clock speeds are increasing at a faster rate than memory speeds. The creates a time gap, typically measured in clock cycles of the CPU, between, the request for information in memory to when the information is available inside the CPU. If a CPU is executing instructions in a linear manner, when a currently executing instruction needs to read a memory location from the memory system, the request is xe2x80x9cvery urgentxe2x80x9d. The processor must wait, or stalls, while the memory system provides the data requested to the CPU. The number of CPU clock cycles between the clock cycle when the memory request was made to the cycle where the data is available to the instruction that needed it in the CPU is called the latency of the memory.
Caches are used to help alleviate the latency problem when reading from main memory. A cache is specially configured, high-speed, expensive memory in addition to the conventional memory (or main memory). FIG. 1 depicts a conventional hierarchical memory system, where a CPU is operatively coupled to a cache, and the cache is operatively coupled to the main memory. By placing the cache (small, relatively fast, expensive memory) between main memory (large, relatively slow memory) and the CPU, the memory system as a whole system is able to satisfy a substantial number of requests from the CPU at the speed of the cache, thereby reducing the overall latency of the system.
When the data requested by the CPU is in the cache (known as a xe2x80x9chitxe2x80x9d), the request is satisfied at the speed of the cache. However, when the data requested by the CPU is not in the cache (known as a xe2x80x9cmissxe2x80x9d), the CPU must wait until the data is provided from the slower main memory to the cache and then to the CPU, resulting in greater latency. To help address the problem of latency and to increase the hit to miss ratio associated with cache memory, computer systems have included instructions for prefetching data from memory to cache. For example, instructions set architectures (ISA""s), such as SPARC(trademark) V9, support software data prefetch operations. The instruction""s use, however, is left entirely to the program executing in the CPU. It may not be used all, or it may be used with little or no intelligence, adding little in the way added performance. Because the level of knowledge needed about the CPU and its memory is extremely detailed in order to effectively use prefetch instructions, their use is generally left to compilers.
In addition to the need for optimizations to handle the CPU/memory interface, optimizations are always needed within the CPU. Any optimization that maximizes the use of local registers (internal CPU registers) and internal execution units will not only speed up execution because all data and instructions are local or internal, such optimizations will minimize processor stalls by minimizing the number of times data or instructions need to be read from main memory into cache, and then into the CPU.
Additionally, there is the problem of effectively optimizing code bases (program modules) over an entire program. For anything other than small applications, the code is spread through many files. Traditional optimizers work on one of these files at a time, leaving any optimizations that cross file boundaries untouched.
Accordingly, there is an urgent need for a method and apparatus which can optimize code to minimize cache misses and maximize the internal efficiency of code using the CPU. The present invention satisfies this need and other deficiencies found in the background art.
The present invention includes a new method and apparatus for optimizing compiled programs. In the prior art, most optimizations are carried out during compilation, before linking. Recently some optimizers have been designed to work on the code after compilation. These types of optimizers are called link-time optimizers or post compilation optimizers. An advantage of such optimizers is that they can look at the entire compiled code base, rather than individual files before linking. The present invention builds on the advantage of post compilation optimization with the use of a new graphical representation of a program""s code.
The new graphical representation is called an operands graph, which is a unique representation of code that embodies the better known properties of flow graphs and value-expression graphs. An operands graph uniquely represents instructions using a three-operand form combined with value and leaf pairs for each operand. The leaf includes the ability to represent an immediate, register, or memory location value. The values may be definite or value ranges. The operands graph also has the property of allowing an operand to be a Ø function, where the definition of a Ø function is the same as that used in SSA graphs.
The combination of SSA-like properties and value range properties allows the operands graph to enable optimizations on previously unreachable code. The best example is mutliway branch instructions. Prior art optimizers couldn""t access the various branches and simply skipped the code associated with them, with the result being potentially large areas of non-optimized in use. The present invention allows for static simulation of multiway branch instructions, providing visibility into sections of code for optimization that was previously not possible.