1. Field of the Invention
The present invention relates generally to software compilers and, more particularly, to a compiler that restructures program loops to reduce cache thrashing.
2. Description of the Related Art
In modem computer systems, a significant factor in determining the overall performance of the computer system is the speed with which it accesses memory. Generally, faster memory accesses result in higher performance. Unfortunately, however, high-speed memory is expensive. Thus, it is generally economically unfeasible to construct a computer system that uses high-speed memory components as its main memory.
Many modem computer systems employ a memory system that consists of a hierarchy of several different levels. That is, the computer system has a relatively large and inexpensive main memory, which may be comprised of a relatively slow dynamic RAM, or the like, and at least one relatively small high-speed cache. The computer system attempts to maximize its speed of operation by utilizing the high-speed cache as much as possible, as opposed to the slow main memory. In fact, many computer systems have prefetch and cache management instructions that are highly successful when used with software that can predict the portions of main memory that are likely to be needed. The prefetches and cache management instructions can optimize moving data between the main memory and the caches. Thus, as long as the predictions are accurate, each request for memory should result in a hit in the cache, and faster overall operation.
The process of predicting the portions of memory that will be needed is, of course, dynamic and continually changing. That is, the prefetch and cache management instructions may predict that a portion A of memory is needed, prefetch the portion A, and load the portion A into the high-speed cache. However, before the portion A of memory is actually used, or while it is still needed, the prefetch and cache management instructions may predict that a portion B of memory will be needed shortly, and load the portion B into the high-speed cache. Owing to the relatively small size and/or organization of the high-speed cache, storing the portion B in the high-speed cache may overwrite or otherwise remove the portion A from the high-speed cache. Accordingly, the portion A will not be available in the high-speed cache when needed by the computer system. This process of loading the cache with memory and then removing it while it is still needed or before it can be used by the computer system is an example of “cache thrashing.”
Cache thrashing is, of course, undesirable, as it reduces the performance gains generated by prefetch and cache management instructions, and greatly reduces computer system performance. In fact, once cache thrashing begins, prefetch and cache management instructions may actually exacerbate the problem.
Historically, programmers have attempted to eliminate or reduce cache thrashing by restructuring the data used by a program so as to reduce or eliminate conflicts in the cache. That is, programmers have attempted to organize the data so that it is unlikely that the program will need access to two different sets of data that cannot exist simultaneously in the cache. The process of restructuring data has proven difficult to automate, chiefly because the program as a whole must be analyzed to determine if restructuring the data affects other data accesses. During compilation, however, the entire program may not be available, as the compilation process may be applied at separate times to separate pieces of the program. Also, restructuring the data to eliminate thrashing based on one portion of the program, may create thrashing in another portion of the program. Further, the sheer complexity of this analysis increases the likelihood that the restructuring will not optimize the program as a whole with respect to reducing cache thrashing.
The present invention is directed to overcoming or at least reducing the effects of one or more of the problems mentioned above.