Cache memory is one of the critical elements in computer processors for achieving good performance on the processors. Generally, a cache is a smaller, faster memory used by a central processing unit of a computer to reduce the average time to access its main memory. The cache typically stores copies of the data from the most frequently used main memory locations. The fundamental idea of cache organization is that by keeping the most frequently accessed instructions and data in the fast cache memory, the average memory access time will approach the access time of the cache. A cache miss is costly because the data must then be fetched from a higher-level cache, main memory, or potentially another processor's cache on a multiprocessor, which incurs a delay since accessing the other memory is slower than accessing the cache memory. Thus, maximizing the cache's hit rate is one of the important factors in achieving good performance.
An important mechanism used to enhance the performance of cache is data prefetching. Data prefetching generally refers to the moving of data from memory to cache in anticipation of future accesses by the processor to the data, so as to hide memory latency. That is, data prefetching requests data from the memory subsystem prior to when the data is needed. If the correct data can be prefetched early enough, then the high latency of main memory can be hidden. Because microprocessors tend to be much faster than the memory where the program or data is kept, the program's instructions or data cannot be read fast enough to keep the microprocessor busy. By prefetching, the processor has the data from the memory before it will need it. In this way, the processor will not need to wait for the memory to provide its request.
There are several difficulties encountered in trying to design a successful prefetch strategy. Many of the difficulties involve having to determine which data should be prefetched. Previous work both in hardware and in software has investigated how to determine the correct data to prefetch. Other related questions include when or how early to prefetch the data. For instance, even if the correct data is prefetched, if it is prefetched too early, the prefetched data may be evicted before it actually gets a chance to be used. Additional questions involve how much of the data to prefetch, for example, because prefetching too much data places contention on the memory system.
There are, however, other problems compounded partly by the above-described difficulties. One problem that arises is the amount of time taken away from the main processor in determining what to prefetch. Another problem is requesting data that is not mapped in the TLB (translation lookaside buffer), SLB (segment lookaside buffer), or ERAT (effective-to-real address translation cache), or the like. Briefly, the TLB is a cache in a CPU (central processing unit) that contains parts of the page table, which translate from virtual into real addresses. The TLB improves the speed of virtual address translation because it stores or caches the translated virtual to physical address mapping. Typically, the search key is the virtual address and the search result is the corresponding physical address. If the search yields a match the virtual to physical address translation is known and the searched result data is used. If there is no match, the translation using the page table needs to be performed. This translation typically takes additional cycles to complete. Similarly, an SLB contains segment translations. Likewise, an ERAT is used to handle instruction-address translation and typically contains entries that map effective address for a page to its corresponding real address in memory.
Generally, the tables such as the TLB, SLB, and ERAT as described above, are caches of recent virtual-to-physical mappings used to accelerate address translation. In the existing prefetch methods, a prefetch is dropped if its virtual address does not match an entry in the cache tables such as the TLB because a fault handler must be run, which is an expensive operation. Thus, conventional prefetching methods have not addressed the problem of unmapped data access. Accordingly, what is needed is an efficient and reasonably accurate method for prefetching data that would reduce the processing load of the main processor. A prefetching method that is able to handle unmapped data is also desirable.