In a chip-multiprocessor (CMP) system, the DRAM system is shared among cores. In a shared DRAM system, requests from a thread can not only delay requests from other threads by causing bank conflicts, bus conflicts or row-buffer conflicts, but they can also destroy DRAM-bank-level parallelism of other threads. Requests with latencies that would otherwise have been overlapped could effectively become serialized. As a result, both fairness and system throughput may degrade, and some threads can starve for long time periods.
One approach to providing fair and high-performance memory scheduling is using a scheduling algorithm called parallelism-aware batch scheduling (PAR-BS), as shown in Onur Mutlu and Thomas Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” isca, pp. 63-74, 2008 (2008 International Symposium on Computer Architecture), all of which is incorporated by reference herein, except where inconsistent with the present application. PAR-BS design is based on two ideas: (1) request batching; and (2) parallelism-aware memory scheduling. First, PAR-BS processes DRAM requests in batches to provide fairness and to avoid starvation of requests. Second, to optimize system throughput, PAR-BS employs a parallelism-aware DRAM scheduling policy that aims to process requests from a thread in parallel in the DRAM banks, thereby reducing the memory-related stall-time experienced by the thread. PAR-BS incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities.