1. Technical Field
A “request scheduler” provides techniques for reducing delay in servicing requests from executing threads in a computer system with shared memory, and in particular, various techniques for dynamically batching and scheduling concurrent requests in parallel to reduce overall request pendency in a multi-threaded environment for improving overall memory performance.
2. Related Art
Many conventional general-purpose computers include one or more microprocessors, with each microprocessor containing one or more processing cores. In such systems, each core may also be capable of executing multiple threads. Typically, in addition to any L1 or L2 processor memory, such general-purpose computers include one or more banks of shared memory, such as DRAM or SRAM, for example. A memory controller typically provides access to the shared system-level memory by scheduling thread requests issued by one or more processing cores in response to instructions from applications or from the operating system. Unfortunately, concurrent requests to the system level memory coming from one or more processor cores and/or from one or more simultaneous or parallel threads often cause conflicting memory requests that interfere with each other. Such conflicts tend to degrade overall system performance.
In general, system level memory such as DRAM, for example, is organized into multiple banks such that memory requests to different banks can be serviced in parallel. Each DRAM bank has a two-dimensional structure, consisting of multiple rows and columns. Consecutive addresses in memory are located in consecutive columns in the same row. Each memory bank generally has one row-buffer and data can only be read from that buffer. The row-buffer contains at most a single row at any given time. Therefore, due to the existence of the row-buffer, access to one or more specific memory addresses in response to a thread request generally falls into one of three categories. Specifically, these categories include: 1) “Row hit” requests, where the current request is to the row that is already in the row-buffer; 2) “Row conflict” requests, where the current request is to a row different from the one that is currently in the row-buffer; and 3) “Row closed” requests, where for any of a number of reasons, there is currently no row of memory stored in the row-buffer.
Conventional memory controllers (either integrated into a processor or implemented as a separate attached component) generally include memory access schedulers designed to maximize the bandwidth obtained from the system level memory in order to improve overall system performance. For example, a simple solution to the memory request problem may use a scheduling algorithm that serves memory requests based on a “First-Come-First-Serve” (FCFS) policy. However, as is well known to those skilled in the art, a pure FCFS-based memory access scheduler can be very inefficient since it typically incurs a large number of row conflicts when accessing the system level memory.
Instead, many conventional memory access schedulers employ a “First-Ready First-Come-First-Serve” (FR-FCFS) algorithm to schedule thread requests to access particular system memory addresses. FR-FCFS-based memory access schedulers generally prioritize thread requests to a particular memory bank by first giving higher priority to requests that would be serviced faster (i.e., requests for a memory location in the same memory row that is already open in the row buffer, also referred to as a “row-hit-first” rule). In other words, higher priority is assigned to requests that would result in a row hit over ones that would cause a row conflict. Further, once the row-hit-first rule has been evaluated to prioritize pending requests, typical request schedulers then give a next higher priority to any remaining requests that arrived earliest for a particular memory bank (i.e., an “oldest-within-bank-first” rule).
In other words, conventional FR-FCFS algorithms typically attempt to maximize system level memory bandwidth by scheduling memory access requests that cause row hits first (regardless of when these requests have arrived) within a particular memory bank. Hence, streaming memory access patterns are given the highest priority by the memory controller, and are served first. Then, the oldest requests for memory access to the same memory bank among any remaining requests are given the next highest priority and are served in the order received. Therefore, the oldest row-hit memory request has the highest priority. In contrast, the youngest row-conflict memory request has the lowest priority.
As the number of cores in computer processors increase, and as operating systems and applications make greater use of multi-threading and hyper-threading based techniques, the number of concurrent requests to system level memory banks will increase. Consequently, the present abilities of conventional memory controllers to efficiently schedule thread requests for access to system level memory in such environments can cause bottlenecks in overall system performance due to interference between thread requests.
As is known to those skilled in the art, interference of threads/applications in a shared memory system of a general purpose computer can result in a number of serious problems. For example, if scheduling and resource allocation policies result in inter-thread interference in the shared memory controller, such interference can cause loss of control by the operating system scheduler or the hypervisor (i.e., a “virtual machine” monitor) over the system's performance and fairness properties. Another potential problem is that such interference can cause significant inefficiency and loss of control in data centers due to unpredictable and uncontrollable memory system performance. Yet another potential problem is that such interference can cause degraded system performance and significant user-productivity loss. In addition, such interference can cause unpredictable application program performance, which renders performance analysis, optimization, and isolation extremely difficult.