1. Field of the Invention. This invention relates to methods for optimizing operation of a computer system, and more particularly, to methods for optimizing compiling of programs to minimize times when data is not available in a cache memory.
2. Prior Art. A cache memory is a fast memory used to collect data and instructions from slower main memory storage units. Computer architectures use cache memory storage units for intermediate storage of data and instructions. Portions of the contents of a main memory unit can be transferred, or mapped, from the main memory unit to a cache memory unit by several mapping techniques. For direct mapping, designated portions of the contents of the main memory unit are transferred directly to corresponding cache memory locations. For associative mapping, designated portions of the contents of the main memory unit are transferred to any locations in a cache memory. For set-associative mapping, each modulo (n) group of (m) blocks in the main memory is mapped into a corresponding row or block of the cache memory.
It is a well-known characteristic of computer programs that a segment of a computer program, which spans several instruction cycles, refers to and requires access to only certain memory locations. These memory locations tend to be clustered in particular, relatively small areas of memory. Therefore, relatively small, very fast memories may be advantageously used to handle memory references and accesses. This suggests the use of a cache memory unit to permit information to be stored in a relatively small memory unit, which has faster access time than the main memory units, so that programs can be executed faster. A cache memory can contain instructions and data. A computer system with a cache memory unit examines the current address and the next address. If the required information is contained in the cache memory unit, execution is fast. If the required information is not within the segment of main memory currently held by the cache memory unit, the control logic for the system automatically finds and loads the information into the cache memory unit for execution of the program instruction.
A performance parameter which is of particular concern is the ratio of cache misses to cache hits, where a cache miss is defined as a reference to memory which cannot be satisfied by the contents of the cache memory. A cache hit is defined as a reference to memory which can be satisfied by the contents of the cache memory. Since the size of the cache memory unit is only a fraction of the size of a main memory unit, sometimes it is necessary to fill the cache memory unit with new information from the main memory unit. In that case, data which has been previously stored in the cache memory must be replaced, or overwritten, to accommodate new information. Therefore, a cache miss is the result of a first-time requirement for memory information or a subsequent requirement for information which has been overwritten in the cache memory.
One particular technique to increase cache hits is to load forward memory information, that is, to load the information from a number of consecutive memory locations of a main memory unit into a cache memory unit when a first cache miss is encountered. The number of consecutive main memory locations, the contents of which are located in a cache memory, depends on the cache block size, where cache block size is set to be a power of 2. The limit on block size is the traffic ratio, which is defined as the ratio of bus traffic in bits per second of a system with a cache unit memory to bus traffic of a system without a cache memory unit. The traffic ratio measures the effectiveness of a cache memory unit in reducing main memory bandwidth, that is, the product of the number of bits per second and their speed. System bandwidth is defined as the word length of a memory multiplied by the number of words that can be referenced by the system in one second. This technique is particularly useful for an instruction cache memory, where a high percentage of memory access operations are sequential.
A compiler is a computer program which processes a program written in a source language by translating the program into an equivalent program in another, target language, which is often a machine instruction set. Compilation includes four basic steps: an analysis step; and intermediate code generation step; a code optimization step; and a final code generation step.
The code optimization step for a compiler is a process whereby a translated program is made to perform as efficiently as possible. Optimization of computer code involving loop functions is extremely important because instructions within a loop are repeatedly executed N times, where N is the number of times an instruction sequence is executed in a loop routine. The main techniques for optimizing loop performance is loop-invariant code motion, which takes any expression that yields the same result independently of the number of times a loop routine is executed and places that expression outside the loop routine, in a position to be executed prior to the loop itself being executed.