In most computer and data processing systems, the main active memory, which is typically random access memory (RAM), is a dynamic random access memory (DRAM). The structure of a DRAM is generally composed of a number of memory cells organized into banks. Each bank corresponds to an array of the memory cells with each cell being respectively associated with a digit of data (e.g., a bit) at a memory address. In particular, memory addresses within a bank are each designated by a row address and a column address, wherein each row address addresses a memory page. Each page of memory, therefore, contains several memory locations corresponding to the different column designations within the page.
When performing a series of access requests, a page request may occur to a bank currently having another page open, which is commonly referred to as a “page conflict,” whereupon the previously opened page must first be closed (e.g., pre-charged). After closing the previous page, the requested page may then be opened (e.g., activated) and then the read or write operation to the requested page may be performed. A “page miss” occurs if the currently requested page is found in a bank which has no page open, thus requiring an activation procedure to be performed. A “page hit” is said to occur when a current memory access request is for a page which is already open from a previous memory access request.
Due to the extra processing which must be performed for page conflict and page miss memory accesses relative to page hit requests, the time needed to perform the former two processes is significantly greater than for the latter. In early stages of microprocessor technology development, requests to access a DRAM memory page, for both read and write operations, were received and fulfilled on a first in, first out basis. Such processing may be inefficient, resulting in a large number of page misses and conflicts, and thus requiring an extensive dedication of processor and/or memory controller resources to pre-charging and activating memory pages.
More recently, more advanced processing methods have been developed in which memory access is based on priority. The priority of the access request may be based on various factors such as the type of device sending the request, the type of access requested, the memory address desired to be accessed by the request, etc. The problem with providing memory access strictly on priority, however, is that low priority requests may be denied access for unacceptably long periods of time.
Furthermore, the number of microprocessors in a system, the number of cores in a microprocessor, and the number of process threads per core are increasing greatly in the near term and are expected to continue to increase over the next few years. Systems with hundreds to thousands of execution threads may be envisioned. These systems are often designed such that multiple processor chips access a common memory. These multiples sources requesting access to a common memory place additional pressure on the memory.
One of the effects of the increase of the number of cores and threads will be requirements for greatly increased memory bandwidth, with a major side-effect that the address request stream seen by the memory system will be more random because of the increased number of actually independent or seemingly independent program execution sequences. Increases in the size of level 1 and level 2 caches, which is how total memory bandwidth and latency issues have been addressed in the past by most system implementations, may be less effective and have less opportunity for growth because of the increased number of cores and limits on reasonable die size. Furthermore, an increase in the number of threads being executed in each core will likely reduce average cache hit rates, again resulting in increasing memory traffic.
In current DRAM technology, the time to cycle a memory bank—activate the bank, read or write the requested data, and recharge the bank—is much longer than the data movement time. This long cycle time means that if two requests are close in time but are for the same memory bank the memory input/output (I/O) pins become idle for a period of time waiting for the first bank cycle to complete so the second bank cycle can be started. As DRAMs generally have multiple banks that can be cycled independently, this bank timing conflict wastes available memory bandwidth.
With such disparate memory request sources, there is a need for apparatuses and methods to generate improved memory performance in a system environment of multiple threads and multiple processors.