This invention relates generally to storing pinned programs and data structures in the cache of a data processing system. More particularly, it relates to avoiding cache collisions between high frequency pinned routines and data by loading these routines and data at noncompetitive memory locations.
It is becoming increasingly true that the performance, i.e. speed of execution, of programs running on modern digital computers is dominated by several effects which are largely unrelated to the actual instructions or sequences of instructions executed. Rather, performance has become dependent on the physical positions at which the routines and data are stored within the memory hierarchies of the machines at the time when the routines and data are required for processing by the Central Processing Unit (CPU).
In stored program digital computers, memory typically forms a hierarchy with multiple levels, e.g., the hard disk, the main memory, and the registers of the CPU, along with one or more levels of intermediate cache between the main memory and the CPU. The discussion below assumes at least one level of cache exists between main memory and the CPU. There is a direct relationship between the speed of such memory and its cost; faster memory is more expensive. Of course, programs executing instructions on machines with fast memory take less time than those executing on machines with slow memory and, as a result, users of computers are desirous of running their programs on machines with the fastest memories that they can afford to use for their particular application. There is strong motivation on the part of computer designers to arrange their machines so as to achieve the best possible trade-off of cost for speed. It is precisely this cost versus speed compromise which has led computer designers to a hierarchical structure for the memory component of the stored program digital computer.
It is typically a characteristic of the main memory component that it will be large, slow and inexpensive in comparison to the cache memory component. An order of magnitude difference in cost and speed between the main memory and the cache is not uncommon; in size, there are ordinarily several orders of magnitude difference between the main memory and cache, with the cache being the smaller of the two. Again, as noted above, this size difference is driven by the cost of the higher speed cache memory as compared to the cost of the main memory.
The Central Processing Unit (CPU) will typically operate at speeds which are significantly faster than the main memory. As noted earlier, the speed of the memory determines the rate at which instructions and data can be fetched from the memory and delivered to the CPU for processing. Given the relative costs of the cache memory as compared to the main memory, the cache memory will be much smaller.
As main memory is a limited resource, only a fraction of the total set of instructions and data can be loaded into memory at any given time. Similarly, only a fraction of main memory can be stored in any one of the caches. In addition, the caches may have restrictions on where the data stored at particular main memory locations can be concurrently stored. Given that main memory is much larger than the cache, and given that an algorithm exists that maps each block from the main memory into one or more specific locations within the cache, each location in the cache either holds one of the allowable blocks from main memory, as specified by the mapping algorithm, or the cache location is marked as invalid. The speed of the memory determines the rate at which instructions and data can be fetched from the memory and delivered to the CPU for processing.
When the computer has reached a steady state, the CPU is fetching instructions and data from the cache, the majority of the cache locations contain valid instructions and data, and the CPU requires yet one more block from main memory to continue execution of the program. At this point, the control hardware for the cache selects from the allowable cache locations one block of data to be replaced or overwritten by the new block from main memory. The specific implementation of the cache determines which locations are allowed to contain the new block of instructions or data, based on the location of the block in main memory, i.e. the block's address in main memory. The system then fetches from main memory and loads those bytes into the chosen location of the cache. It is at this point that the problem addressed by this invention arises.
Since each of the cache locations typically map multiple addresses within main memory, the system may need to overwrite some of the instructions or data already in the cache to get the new instructions or data into the cache. When frequently accessed instructions or data overwrite infrequently accessed instructions or data, the impact of a "re-fetch" on the performance of the system is small; this is true since infrequently used instructions or data are used infrequently. However, when frequently used instructions or data are overwritten, the effect on system or application performance can be large. When the particular block of frequently accessed instructions or data are next needed, they will have to be re-fetched from main memory and will in turn overwrite something else in the cache. If the mapping of main memory blocks to cache locations does not permit certain frequently accessed routines data to reside concurrently in the cache, the cache will begin to "thrash," which is as bad as it sounds. Cache thrashing occurs when the system, due to the placement of frequently accessed instructions or data in main memory, repeatedly invalidates and overwrites those instructions or data within the cache.
Thus, it would be advantageous to develop a scheme for minimizing the probability that frequently accessed routines and data structures are repeatedly overwritten. This invention comprises one such scheme for routines and data structures that are pinned to particular addresses in main memory, e.g., during the building of an operating system kernel.