In a multiprocessor system with distributed shared memory, each processor node may have a fraction of the total distributed shared memory that is local to that processor node. Because the address space of the distributed shared memory is shared, the same physical address on any processor node refers to the same location in the distributed shared memory. Distributing the memory may provide a cost-effective way to scale the memory bandwidth when most accesses are to local memory of a node and may reduce latency for accesses to the local memory. By separating local memory traffic from remote memory traffic, bandwidth demands on the distributed shared memory system and interconnect network may be reduced.
Latency generally refers to the elapsed time between issuing a request to the memory system and receiving a response or reply. Latency may be measured in units of time (seconds, microseconds, etc.) or in cycles. Memory bandwidth generally refers to the throughput of the memory system (i.e., the rate at which the memory system can satisfy requests). Memory bandwidth may be expressed as the number of requests per unit time. When each request corresponds to a fixed number of bytes of data, for example, bandwidth may be expressed as the number of bytes per unit time.
In existing shared memory multiprocessors, communication of data between processor nodes may cost anywhere from 50 clock cycles for multicore processor chips to over 1000 clock cycles for large-scale multiprocessors depending on the communication mechanism, type of interconnect network and scale of the multiprocessor. Thus, accesses to memory that is local to a node is generally faster than accesses to memory that is remote (i.e., accesses to memory that is local to another node). Remote accesses typically incur a penalty to go across the interconnect network and return, resulting in increased latency.
Local memory for a node may be provided by interleaving the distributed shared memory among the nodes. Typically, certain fixed bits of the physical memory address may be used to identify the local node for a portion of the distributed shared memory. For example, in a four node distributed shared memory multiprocessor system, physical address bits 30 and 31 may be used to identify the node for a memory address when the shared memory space is interleaved on one gigabyte address boundaries.
However, such a coarse-grained interleave may lead to memory hotspots when shared code and/or data structures residing on a particular node are frequently accessed by other nodes (i.e., a node's local memory is frequently accessed by remote nodes). Hence there is a need in the art for techniques to interleave distributed shared memory of a multiprocessor system that reduce memory hotspots when code and/or data structures are shared across the nodes.