Field of the Invention
The present invention generally relates to computer science and, more specifically, to selecting hash vales based on matrix rank.
Description of the Related Art
A typical computer system includes a central processing unit (CPU) and one or more parallel processing units (GPUs). The CPU usually executes the overall structure of a software application and then configures the PPUs to implement tasks that are amenable to parallel processing. As part of executing the software application, the CPU and the GPU access physical memory locations included in the computer system. Such memory locations may be included in any memory accessible to the computer system, such as larger, relatively low speed system memory or smaller, relatively high speed memory caches.
To optimize the performance of the software application, the CPU and GPU are usually designed to store frequently accessed data in the memory caches. And since memory buses included in computer systems are architected to transfer data in discrete memory blocks, the memory caches are typically designed to store data in “cache lines,” where each cache line stores a multiple of this memory block. Accordingly, the computer system implements one or more memory addressing techniques that map each memory address associated with a memory cache to a corresponding cache line included in the memory cache and an offset within the cache line.
In one approach to memory addressing, a sequential series of bits in each memory address maps to a cache line within the cache in a direct fashion. For instance, suppose that a particular cache were to include 2S cache lines. In some computer systems, the computer system would linearly map the S upper bits or the S lower bits included in the memory address to the cache line. While this direct memory addressing approach is relative simple to implement, this approach may lead to uneven distribution of memory accesses across caches lines—“hotspotting” certain cache lines in the cache. For example, if a particular software application were to address memory at an interval that shared a common integer multiple with the size of the cache, then the corresponding memory access operations would involve only a single cache line. In general, hotspotting bottle-necks specific computer system resources and, consequently, may degrade the overall performance of the computer system.
Increasingly, to distribute memory accesses more uniformly across cache lines irrespective of access patterns of various software applications, computer systems incorporate hashing operations into the memory addressing process. In operation, the computer system generates a transform matrix that includes hash values. Subsequently, the computer system performs arithmetic operations between the transform matrix and the input address to create “swizzled” addresses used to access the appropriate data within the cache.
Typically, to select the hash values, the computer system randomly generates multiple sets of numbers—setting the hash values to the set of numbers that experimentally demonstrates the highest likelihood of reducing hotspotting. Notably, most hash values lead to limited reduction in hotspotting. Consequently, identifying a set of hash values that causes significant improvement in the overall performance of the computer system usually requires generating, performing experimental test-runs, and evaluating many sets of random numbers. This ambiguous and repetitive approach to hash selection is time consuming and does not necessarily lead to the desired performance improvement in memory addressing.
As the foregoing illustrates, what is needed in the art is a more effective approach to addressing memory caches.