1. The Field of the Invention
The present invention relates to the fields of computing systems and the design of compilers for computing systems. More specifically, the present invention provides a method and apparatus for determining the amount of padding to be provided between arrays of data stored in the memory of a computing system.
2. The Relevant Art
The combination of increased microcomputer speeds and memory capacity has been a tremendous benefit for scientific and engineering computing applications. Using today's fastest microprocessors large amounts of data can be processed in relatively short periods of time. This allows for greater accuracy in areas such as image processing and computer simulation of natural phenomenon such as fluid dynamics and weather prediction.
Although computing speeds have increased dramatically at a steadily decreasing cost, computer memory capable of supplying data at a rate comparable to which the microprocessor can process the data is still relatively expensive. Thus, many high performance computing systems still rely on cheaper, slower memory in which to store the large data sets being processed. In order to increase system performance, computer designers have developed a compromise solution in which a small amount of relatively expensive, but very fast memory, such as static random access memory (SRAM), is used to store data that is being accessed frequently by the microprocessor. The bulk of the data resides in the cheaper, slower random access memory (RAM). Data is swapped between the faster memory and the slower memory as needed by the microprocessor. Such a design is referred to generally as "caching".
Typically a cache is divided into lines which consist of a fixed number of sequential memory locations. Each of the lines in the cache has an associated tag register that holds the main memory address of the contents of the cache line. When the microprocessor performs a data access, it maps the address of the main memory onto a fixed number of cache lines. A check is then performed to determine if the address matches any of the tags for the lines. If a match is found, then that line is accessed by the microprocessor and the data is processed. Otherwise, the data access operation is said to "miss" (Ralston 1993).
Typically, when a miss occurs one of the lines in the set is replaced with data from another section of the main memory. Several strategies are available for determining which of the cache lines is to be replaced when a miss occurs. Two commonly used strategies for replacing the contents of a cache line include replacing the contents of the least recently used (LRU) line strategy, and the random strategy in which one cache line is replaced at random. Although these strategies are effective for most situations, problems can arise when the access patterns of different arrays in the software being executed cause those arrays to be mapped onto the same cache lines. In this instance, the performance of the system can become severely degraded as the same cache lines are continually remapped as the processor attempts to access the data from the conflicting arrays. Such an occurrence is referred to as a "cache conflict" and can degrade severely processor performance as repeated data swaps are made during the data processing operations. Because the conflicts arise from conflicting patterns of data access from arrays, cache conflicts tend to occur most frequently in scientific and engineering computing applications as these application often rely on data stored in large arrays. However, any software which processes data in arrays is prone to cache conflicts.
In some cases the use of an associative cache can alleviate cache conflicts, but associative caches are expensive and difficult to implement and still do not provide a general solution to the problem of cache conflicts. Generally, however, when cache conflicts occur a trial-and-error approach is used to determine how much spacing between the conflicting arrays is required to eliminate the conflicting data access patterns. This spacing is also referred to as "padding". A number of attempts have been made to solve the cache conflict problem by determining the necessary padding analytically. One such attempt (Bacon 1993) describes a method for determining array padding to address several causes of cache conflicts. However, this strategy relies heavily on the use of memoization and employs a relatively brute force strategy for detecting conflicts and determining the padding in which the array references for the arrays are compared pairwise to determine if the arrays map to the same cache lines. If the difference between the first dimension of any two arrays is a number multiple of the cache size, a conflict will occur between those arrays and padding between the arrays must be provided. Unfortunately, this scalar approach to detecting and resolving cache conflicts is compositionally inefficient.
The use of high performance computing systems to process large quantities of data, especially data stored in large arrays as commonly done in computing and engineering and scientific computing applications, requires the highest levels of system performance. Currently, achieving such high levels is extremely difficult because of the existence of a cache conflicts. Although some methods are available for resolving these conflicts, they remain inefficient to employ. Thus, it would be advantageous to provide a method and apparatus in which cache conflicts can be predicted and appropriate array paddings determined and implemented in an efficient manner.