The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. While there have been many advances in technology since 1948, modem day computer systems still use much of the same basic componentry that was used in the EDVAC device. Two basic components that are still found in most every system are the computer system processor and its memory. The processor is the active palr of the computer system; it reads and processes information stored in the computer system's memory to perform the task assigned to it by the computer system user. While the speed at which a computer system can respond to its user's requests has always been a factor in consumer purchase decisions, computer system speed has never been as important as it is in today's marketplace. Consumers want computer systems that are fast enough to easily handle work-intensive computer programs that leverage modem day advanced technologies (e.g., multimedia and object oriented technology). Therefore, computer system manufacturers are constantly striving to make their computer systems faster and faster.
One well known way to make a computer system faster is through the use of special memory called data cache memory. Cache memory is special because a processor can retrieve information from cache memory much faster than it can from standard memory (called main memory). However, this speed is not without cost. Cache memory is significantly more expensive than main memory. Consequently, computer system designers balance the need for speed against the cost of cache memory by keeping the size of cache memory relatively small when compared to that of main memory.
The key, then, is to make sure that small but fast cache memory always contains the information needed by the processor. However, since cache memory is typically much smaller than main memory, the computer system must be able to move information from the slower main memory into the faster cache memory before the information is needed by the processor. A "cache miss" is said to occur when the processor is forced to wait because the correct information was not present in the cache memory when it was needed by the processor. Of course, the value of any given cache management mechanism is measured by how successful the mechanism is at preventing cache misses. Cache misses are increasingly becoming a major performance impediment because processor speed is increasing much more rapidly than that of memory, which means that in most cases it is the slowness of memory that stands in the way of better performance. In other words, it does not matter how fast a processor can process information if the processor has to wait to get the information it needs. It is no surprise, then, that the mechanisms used to reduce the frequency of cache misses, and their associated speed penalty, have become extremely important to the computer industry.
Many advanced computer system architectures include an instruction (sometimes called a touch or preload instruction) that can be placed in the instruction stream of a computer system to move information from main memory into data cache memory. When a preload instruction executes, it takes a previously generated main memory address and causes the movement of the associated information. However, the problem is not the actual ability to move the information into data cache memory, but is instead knowing what information to move and when to move it. Indeed, unintelligent use of preload instructions may even hinder rather than improve computer system performance. For example, an unintelligent mechanism that merely arbitrarily inserted preload instructions near all instructions that actually reference/load information is, in most cases, ineffective because the timing is such that there is not usually adequate time to load much of the needed information into data cache memory before the actual reference takes place. A crude mechanism of this sort would add a tremendous number of preload instructions, most of which would be of minimal usefulness. In fact, it has been shown that the resulting increase in code size (called "code bloat") actually worsens computer system performance instead of improving it (i.e., due to the negative impact on instruction cache and main memory paging performance).
One intuitive solution to this timing problem might be a mechanism that attempted to ensure adequate time by merely inserting preload instructions a certain number of instructions up the stream of instructions from the instructions that actually reference/load the information. However, an unintelligent mechanism of this sort is likewise of limited value because the address needed by the preload instruction (i.e., the address of the information) may not yet have been generated when the preload instruction is due to execute.
An additional problem with mechanisms of this sort is the potential overuse of preload instructions. Overuse of preload instructions can be problematic not only because of code bloat but also because overuse greatly increases the likelihood that useful information will be unnecessarily cast out of data cache memory (i.e., because data cache memory is only so big) before it is needed by the processor. This problem is known as "cache thrashing".
While somewhat helpful, existing mechanisms for preventing cache misses tend to be crude and unintelligent. Indeed, some mechanisms may even cause more harm than good by introducing unwanted side-effects such as code bloat and cache-thrashing. Without an intelligent mechanism that reduces both the frequency and cost of cache misses, the computer industry will never be able to fully realize the benefits of today's faster processors.