Some complex electronic systems, for example multi-media appliances such as cell phones and tablets, or various multi-processor systems, may have multiple master devices (masters) sharing a single large memory. Masters are any circuits capable of initiating a memory access request to the shared memory. For example, masters may be processors (for example, general purpose microprocessors, graphics processors, microcontrollers, etc.), or subsystems or blocks of logic that need to write to memory after receiving data (for example, a camera sensor), or subsystems or blocks of logic that need to read from memory (for example, a display controller).
Two important memory access measures that affect performance are latency and bandwidth. Overall latency is the time from when a master requests memory access until the time that the first portion of the requested data is received by the master. This document focuses primarily on the time from when a master requests memory access until the time that the request is received by the shared memory. Bandwidth refers to the amount of data that passes through one point per unit of time. This document focuses on bandwidth required for transferring large blocks of data to and from memory.
Various types of masters have different latency and bandwidth requirements. High performance processors require the lowest possible latency to provide maximum performance. Caches are used to reduce the number of external memory accesses, which reduces bandwidth requirements and reduces average latency. However, in the case of a cache miss (with some exceptions such as pre-fetching a line or speculatively reading a line), a processor may be stalled, and in the case of a stall, access to external memory must be served with the lowest possible latency to ensure maximum performance. Some other masters require high bandwidth with a guarantee on average throughput, and therefore a guaranteed average latency. For example, graphics processors must read or write large uninterrupted bursts of data, but they have large local memory buffers that store data that are prefetched in advance, so their latency requirements are relatively relaxed. That is, average throughput is important but latency for a single request can be very high. Other masters, for example a real-time controller streaming data from a camera sensor, need some guaranteed maximum latency. Typically, they have known, regular, and predictable bandwidth requirements and they use local buffers that are optimized to reduce overall system cost. However, buffers optimized for cost cannot guarantee latency. If those buffers should overflow or underflow, then the system might fail to perform a requested operation and temporary performance degradation may not be acceptable. Therefore, there is need for a separate hardware mechanism that can guarantee some maximum latency. Finally, for some masters, timely execution and low latency is not required. For example, data transfer to mass storage or data transfer across a serial link interface is usually not time critical.
Currently, the most commonly used large shared memory is dynamic memory. Dynamic memory typically stores information as a charge or lack of charge in capacitors. However, charged capacitors gradually leak charge. Therefore, dynamic memory must be periodically refreshed (read and rewritten to guarantee that data are not lost). Dynamic Random Access Memory (DRAM) is typically organized into banks, and banks are organized into rows, and refresh is typically performed on an entire row. A common industry standard specifies that each row must be refreshed every 64 milliseconds or less. There may be many thousands of rows. Refresh may be executed one row at a time at uniformly spaced time intervals during the 64 millisecond interval. Alternatively, multiple rows (or even all the rows) may be refreshed in a burst of activity.
For dynamic memory, refresh schedulers in memory controllers must periodically issue memory refresh commands, which may be disruptive for masters that need to access the memory. Dynamic memories allow some delay of memory refresh commands, which may then be issued in bursts, but there is a specified limit on how much delay can be accumulated, and at some point recovery is needed to execute all required refresh commands in a given period.
Each master, including the refresh scheduler, issue requests for memory access, and those requests must be prioritized. In addition, the relative priorities of the requests may need to change dynamically. For example, if a video processor is being used, the video should not pause while waiting on more video data. Therefore, if a data buffer for a video processor becomes nearly empty, the priority of pending memory access requests from the video processor needs to be increased to prevent buffer underflow. Likewise, if memory refresh commands have been delayed for a long time, then the priority of memory refresh needs to be increased. There is an ongoing need to improve priority management of memory access.