The shift to multiple/many-core (multi-core) processors has made multithread applications prevalent in both client and server platforms. High thread-level parallelism (TLP) of such applications efficiently takes advantage of the hardware parallelism supported by the multi-core processors (e.g., chip multi-processor (CMP) systems). There are software and hardware proposals to expedite execution of multithread applications, such as coordinated thread scheduling of an operating system that executes threads of an application together. However, none of the proposals addresses performance problems in memory scheduling (e.g., by a memory controller) for multithread applications, which can cause significant performance degradation.
Some mechanisms, such as a first-ready first-come-first-serve (FRFCFS) mechanism and a parallelism aware batch scheduling (PAR-BS) mechanism, attempt to improve memory controller performance. For example, the FRFCFS mechanism improves memory controller performance by scheduling memory requests accessing row buffers of a memory bank. The PAR-BS mechanism batches memory requests, executes the memory requests on a batch-by-batch basis, and improves memory controller performance by applying shortest-job-first (SJF) scheduling to each batch. However, both mechanisms only optimize memory scheduling for single-thread applications, and fail to optimize memory scheduling for multithread applications.