In general, the number of clock cycles needed for accessing memory has been increasing. Solutions to this problem have indeed emerged over the years, but have not yet proven to be adequate. For instance, different caching techniques have been contemplated that store a subset of memory into smaller structures and use the data address to decide where a specific datum must be stored (see Smith, A. J., “Cache Memories”, Computing Surveys, September 1982). However, it has also been noted that recent technological trends of increasing wire delays will lead to either increasing access times or decreasing capacities for caches (see Agarwal et al., “Clock Rate Versus IPC: The End of the Road for Conventional Microarchitectures”, Proceedings of the International Symposium on Computer Architecture [ISCA], 2000). Generally, the conventional technique of looking up caches by address will either cause conflict misses or require an associative search that can increase cache access time and consume excessive quantities of power. Register files tend not to present such problems.
Also, since the register number is a field which is part of the instruction, the desired register can often be accessed as soon as the instruction is fetched, leading to a short pipeline. This contrasts with caches in the context of current instruction sets, which require that an address first be read from a base or index register, then possibly be computed through an address addition, and then possibly be translated to a real address, before the cache access can start. Shorter pipelines can offer well-known advantages, such as lower branch penalties.
However, register use is usually scheduled by compilers, which tend to miss many opportunities for allocating memory locations into registers, because of the limitations of compile-time algorithms. For example, register allocation often requires that a compiler should prove that two pointers will never refer to the same memory location, which is hard to determine precisely at compile time.
Generally, a register file represents the first line of defense in avoiding accesses to main memory. By filtering accesses to the level-1 data cache (DL1), the register file reduces the number of accesses to the memory hierarchy. It thereby allows better utilization of the caches at all levels, thus conserving cache capacity and bandwidth for the instructions with poor temporal locality that actually need it. Modern algorithms for static register allocation succeed in servicing a large number of requests for data that would otherwise have to go to the memory hierarchy.
In spite of such sophisticated static algorithms, a large proportion of accesses to the DL1 exhibit temporal locality. While this is good for reducing the miss-rate of the DL1, it suggests that there is still additional room for filtering out accesses to the DL1.
In view of the foregoing, a need has been recognized in connection with improving upon the shortcomings and disadvantages of prior efforts.