For many decades, performance capabilities of electronic processing devices have increased due to various hardware enhancements in such devices (e.g., increased clock frequencies and reduction in calculation time, more efficient processor management architecture). Typically, such increases have been related to improved single-thread performance (e.g., sequential processing). In recent years, however, the complexity of such processors as well as limits on power consumption and heat generation have made further enhancement of single-thread performance increasingly difficult. As a result, processor manufacturers have begun to integrate multi-thread processing (e.g., multiple processors on a chip) to increase system performance in a power efficient manner.
Current high performance general purpose computers can have at least two processors on a single chip. Further, industry trends suggest that integrating even more cores on a single chip will occur in the near future. As a result, processors with many cores on a single chip are likely to be commonplace.
As the capacity for parallel processing increases, computer memory (e.g., random access memory (RAM), dynamic RAM (DRAM) etc.) can potentially become an efficiency bottleneck. For instance, RAM is typically a shared resource that handles all memory requests by processor threads. As parallel processing increases, a concurrent number of such memory requests served by the RAM can substantially increase as well.
In modern computing architectures, a RAM controller is a mediator between processors and RAM modules (and data stored therein). The RAM controller satisfies the processors' memory requests while obeying timing and resource constraints of RAM banks, chips, and address/data buses. To do so, the controller translates processor requests in RAM commands. Two basic architectures are involved within the RAM controller. First, a memory request buffer receives and stores memory requests generated by a processor(s) or processing thread(s). Once stored in a buffer, the request awaits scheduling to an appropriate RAM chip, where data is extracted to serve the memory request. In addition, the memory request buffer maintains a state associated with each memory request. The state can include characteristics such as memory address, type, request identifier, age of the request, RAM bank readiness, completion status, and so on.
In addition, a RAM controller generally has a RAM access scheduler. The purpose of such a scheduler is to select, among all requests currently in the memory request buffer, the request that is sent to the RAM memory chip next. More precisely, the RAM access scheduler decides which RAM command to issue in every RAM clock cycle. It consists of logic that keeps track of RAM state (e.g., data stored in buffers, RAM bus, etc.) and timing constraints of the RAM. The scheduler takes as input the state of the memory requests in the request buffer along with the state of the RAM, and decides which RAM command should be issued based on the implemented scheduling and access prioritization policies (e.g., where such scheduling and policies typically try to optimize memory bandwidth and latency).
In order to maintain efficient data bandwidth to and from RAM, complex RAM request scheduling (as opposed to simple or primitive request scheduling) is typically employed. A complex RAM request scheduling algorithm operates on a memory request buffer and employs a sophisticated hardware algorithm to select requests for service Selection is typically based on maximization of RAM bandwidth or minimization of RAM latency (in contrast, a simple/primitive scheduling algorithm does not try to maximize RAM bandwidth or minimize RAM latency.) Use of sophisticated hardware algorithms does have a cost, however. For instance, implementation difficulty and power consumption for complex schedulers can be proportional to the size of the memory request buffer. If a scheduler attempts to accommodate larger and larger numbers of concurrent incoming requests (e.g., as a result of incorporating a large memory request buffer), scheduling complexity, and therefore hardware implementation complexity, power consumption, and logic delay of the scheduler can increase linearly, or even super-linearly, with the increased number of requests. Therefore, it can be very difficult and costly, in terms of design complexity, design time, and power consumption, to increase the size of the memory request buffer while using complex scheduling algorithms.
As parallel processing (e.g., number of processing cores on a chip), and hence multiple threads sharing RAM resources, becomes more prevalent, the size of the memory request buffer should increase so that system performance can scale to meet parallel processing demands (e.g., to reduce a likelihood that the memory request buffer becomes a performance bottleneck). In addition, to maintain high RAM bandwidth and minimize RAM latency, complex and sophisticated RAM scheduling algorithms optimized for such purposes should be retained. Unfortunately, utilizing a complex scheduling algorithm in conjunction with a large request buffer can substantially increase implementation, design, test, verification, and/or validation complexity as well as power consumption for a RAM memory controller. As a result, overall system scalability for parallel processing architectures can be significantly hindered.
The eventual result of increased parallel processing, with current RAM limitations, is stalled processing. The more memory requests issued by multi-core processors the faster that RAM request buffers will fill up. When such buffers are full, new memory requests cannot be admitted and hence, no thread is able to issue any new memory requests. A processing thread that is unable to fulfill a memory request in cache, for instance, and therefore must generate a memory request, will be stalled until a free slot in the memory request buffer becomes available. Overall system performance will be substantially reduced in such circumstances. Consequently, new RAM interface mechanisms that employ both large memory request buffers and complex scheduling algorithms will likely be required in order to facilitate efficient system performance in respect of foreseeable growth in parallel processing.