1. Technical Field
The present invention relates in general to data processing, and in particular, to an efficient utilization of the processor-memory interface in a data processing system.
2. Description of the Related Art
With the rise of multi-core, multi-threaded data processing systems, the throughput of the processor-memory interface has become a limitation on system performance. With multiple multi-threaded processor cores typically sharing a common system memory controller, data locality is easily lost, and identifying and scheduling spatially sequential accesses is difficult. Inefficient scheduling results in performance reductions and consumes unnecessary energy.
Further, while input/output (JO) frequencies continue to scale with processor core operating frequencies, other key parameters, such as the time to read a memory cell or turn a bus around from a write to a read operation (i.e., tWRT, the Write-to-Read Turnaround delay), are not scaling at comparable rates. At higher signaling rates, the electrical integrity of buses becomes much more difficult to maintain, both within the memory chips and across the processor-memory interface. Consequently, a complex set of timing parameters must be observed, which dictate that gaps be inserted when the access stream transitions from a write to a read or vice-versa, significantly degrading effective memory bandwidth even assuming perfect scheduling of memory accesses.