1. Field of the Invention
The present invention generally relates to cache prefetching, and more particularly to cache prefetching in a computer system.
2. Description of the Related Art
Users of data processing systems continue to demand greater performance for handling increasingly complex and difficult tasks. Greater performance from the processors that operate such systems may be obtained through faster clock speeds so the individual instructions are processed more quickly. However, processing speed has increased much more quickly than the speed of main memory. Despite the speed of a processor, a bottleneck on computer performance is that of transferring information between the processor and memory. Therefore, cache memories, or caches, are often used in many data processing systems to increase performance in a relatively cost-effective manner.
A cache is typically a relatively faster memory that is intermediately coupled between one or more processors and a bank of slower main memory. Cache speeds processing by maintaining a copy of repetitively used information in its faster memory. Whenever an access request is received for information not stored in cache, the cache typically retrieves the information from main memory and forwards the information to the processor. If the cache is full, typically the least recently used information is discarded or returned to main memory to make room for more recently accessed information.
The benefits of a cache are realized whenever the number of requests to address locations of cached information (known as “cache hits”) are maximized relative to the number of requests to memory locations containing non-cached information (known as “cache misses”). Despite the added overhead that occurs as a result of a cache miss, as long as the percentage of cache hits is high (known as the “hit rate”), the overall processing speed of the system is increased.
By way of illustration, assume that a processor performing a first task initiates a read access to main memory for data (or instruction). A segment unit in the processor generates an effective address (also called a linear address, which is an address in the programmer's perspective) which is applied to a paging unit of the processor. The paging unit can be implemented as a separate entity from the processor. The paging unit receives the effective address and determines whether the page containing the requested data currently resides in the main memory. Memory address space is divided into blocks called pages. The main memory is also divided into blocks called page frames. When a page is brought from disk into the main memory, the page occupies a page frame in the main memory.
If the paging unit determines that the page containing the requested data currently resides in the main memory, the paging unit generates a real address (also called a physical address) of the requested data to the processor's address pins. In response, a cache subsystem associated with the processor receives the real address and performs a lookup (search) to determine whether the cache subsystem contains a valid copy of the requested data. Assume that the cache subsystem is a look-through cache, i.e., a type of cache which intercepts all processor's read accesses to main memory. If the look-through cache has a valid copy of the requested data, the look-through cache provides the processor with the requested data and no read bus cycle is initiated on the system bus. If the look-through cache does not have a valid copy of the requested data, the look-through cache initiates a read bus cycle on the system bus to obtain the requested data from the main memory for the processor.
If the paging unit determines that the page containing the requested data does not currently reside in the main memory, a page fault occurs. In response to the page fault, the processor stops the read access to main memory, switches to and executes a page fault handler of the operating system to set up for the transfer of the page containing the requested data from the mass storage device into the main memory, and switches to a second task. The page containing the requested data will occupy a page frame in the main memory.
When the page containing the requested data has been brought into the main memory, the processor (or perhaps another processor, in symmetric multiprocessor systems) switches back to the first task and reinitiates the read access to main memory. Again, the paging unit receives the generated effective address from the segment unit and determines whether the page containing the requested data currently resides in the main memory. Because the page has previously been brought into the main memory, the paging unit generates the real address of the requested data to the processor's address pins. In response, the cache subsystem receives the real address and performs a lookup to determine whether the cache subsystem contains a valid copy of the requested data. Because the page containing the requested has just been brought into the main memory, the cache subsystem does not have a valid copy of the requested data. This is because the change in content of the page frame which receives the page containing the requested data has invalidated any cache line corresponding to the real address of the requested data (if any). As a result, a cache miss occurs and a read bus cycle is initiated to access the main memory for the requested data. The requested data is forwarded from the main memory to the processor. The cache subsystem may also get a copy of the requested data for itself.
Cache misses are undesirable, especially in computer systems using look-through cache subsystems. First, it takes time for a look-through cache to perform a lookup. Only after the look-through cache finds that it does not have a valid copy of the requested data, the cache initiates a read bus cycle on the system bus to obtain the requested data from the main memory. The time it takes a look-through cache to perform a lookup is called lookup penalty. Second, in any cache type, as a result of a cache miss, a read bus cycle must be performed to obtain the requested data from the slow main memory. Moreover, the read bus cycle uses the system bus to transmit the requested data from the main memory to the requesting processor. As a result, less system bus bandwidth is left for use by other bus masters in the system.
In the above description of the operation of the conventional computer system, the cache miss caused by the reinitiating of the read access after the page fault is of particular interest. In database transaction applications, it is likely that a read access to memory for data will cause such a cache miss. This is because the databases in these applications are usually many times larger than the main memory and, therefore, only a small portion of these databases can reside in the main memory at any given time (i.e., most of these databases reside on disk). As a result, it often occurs that the page containing the requested data does not reside in the main memory and, as a result, the cache does not have a valid copy of the requested data. Therefore, the read access will likely cause a page fault and then a cache miss when the read access is reinitiated after the page containing the requested data is brought into the main memory. Performance analysis has shown that with a cache size of 1 Gbytes, the number of such cache misses approaches 20% of the total number of cache misses. These cache misses are considered compulsory misses because no modifications to the cache structure (capacity, line size, associativity, replacement policy, etc.) will result in a reduction in the miss rate. These compulsory cache misses are undesirable and are a disadvantage of conventional computer systems.
Accordingly, there is a need for an apparatus and method for performing read accesses to main memory which overcomes shortcomings existing in the prior art.