A memory interface, such as Dual Inline Memory Module (DIMM) interfaces, are designed to provide memory capacity for processing-modules and ensure optimal memory access throughput.
Multiple memory interfaces are commonly used to increase the processing-module memory bandwidth. However, when the processing-module does not access the entire memory space, it may not be able to fully exploit the multiplied memory bandwidth. In such cases, it is said that the memory bandwidth is suboptimal.
Commercially available solutions to this long-standing problem may include receiving an initial physical address (IPA) and applying an address interleaving algorithm on the received address to produce a mapped physical address (MPA). The address interleaving algorithm may be implemented by dedicated hardware circuitry, and the mapped physical address may include a memory interface index, and a memory interface offset. The memory interface index may refer to an index of a memory device, and the memory interface offset may refer to an address offset from the start address of that device.
Commercially available implementations of an address-interleaving algorithm may include calculation of division and/or modulo of at least a portion of the initial physical address. For example:                (a) The memory index (e.g., an index or identification of a memory device) of the mapped physical address may be calculated as an integer modulo of the initial physical address when divided the number of memory interfaces (e.g., memory interface index=initial physical address % number of memory interfaces).        (b) The memory offset (e.g., an address offset from the start address of the memory device) of the mapped physical address may be calculated as an integer division of the initial physical address when divided the number of memory interfaces (e.g., memory-interface-offset=initial physical address/number of memory interfaces)        
The terms “rank” and “interleaving rank” are used herein to refer to the number of memory interfaces or memory devices accessible for a processing module upon which the address-interleaving algorithm is applied.
The commercially available implementation of an address-interleaving method described above exhibits several problems. For example, a series of memory access operations with a pattern of a fixed address interval (e.g., a serial access to consecutive data objects of identical size) may always map to the same memory-interface and may thus not properly exploit multiplied memory bandwidth. For example, if the difference in address value between two consecutive memory accesses is an integer product of the interleaving rank (e.g., the number of memory interfaces), then the memory interface index may remain the same between the two consecutive memory access operations.
In another example, when the interleaving rank is not a power of 2 (e.g., not 2, 4, 8, etc.), or not a constant number, the implementation of division and modulo calculation in hardware may be resource consuming and time-wise challenging. For example, an Application-Specific Integrated Circuit (ASIC) implementation may require additional clock cycles and/or elaborated setup and hold timing constraints to accommodate a generic division solution.
State of the art processors, such as Intel Xeon, support interleaving ranks which may be a power of 2 number or equal to 3 or a combination thereof, as explained herein, but do not provide real interleaving for every possible value of the interleaving rank.
A system and a method for providing real interleaving for every value of interleaving rank at run time, in a manner that is hardware and time-wise efficient, and resilient to the effect of fixed address intervals is, therefore, required.