Memory systems often provide a non-uniform access latency to different memory addresses depending on the memory regions previously accessed. In banked memory systems, such as those based on a dynamic random access memory (DRAM) architecture, accesses to recently accessed regions of memory (“active pages”) will have a lower latency than accesses to other pages. To illustrate, a Double Data Rate 3 (DDR3) DRAM supports lower latency access to up to eight pages of memory and accesses to other pages will incur penalties. There are restrictions in how these eight pages are organized; in DDR3, the entire array is divided into eight “banks”, and each bank can have one active page.
Many processing systems implement functions using an embedded operating system (such as Linux) or firmware. In both of these environments, software functions allocate memory from a free memory heap. The implementation of these functions often creates buffers aligned to 2n boundaries. The combination of this memory allocation process and the intrinsic nature of DRAMs and other banked memories typically causes an imbalance of in the amount of traffic targeting the lower banks, and particularly for the lowest numbered bank (that is, the “first” bank). This introduces significant inefficiencies in that as more traffic is routed to a bank, it becomes more likely that the accesses will be to pages other than the active page of that bank. As accesses to pages other than the active page incur a higher access latency than accesses to the active page, this increased frequency of access to non-active pages of the bank introduces a significant average access latency penalty. Moreover, while a frequently accessed bank may be processing accesses to non-active pages, other banks that otherwise could be servicing memory accesses are likely to be idle due to the imbalanced distribution of traffic among the banks of the memory.