The present embodiments relate to environments implementing memory control and direct memory access ("DMA"), and are more particularly directed to circuits, systems, and methods in these environments for reducing access latency.
Memory control is typically accomplished in the computing art by a mechanism referred to as a memory controller, or often as a DRAM controller since dynamic random access memory ("DRAM") is often the type of memory being controlled. A DRAM controller may be a separate circuit or a module included within a larger circuit, and typically receives requests for accessing one or more memory locations in the corresponding memory. To respond to each request, the memory controller implements sufficient circuitry (e.g., address decoders and logic decoders) to provide the appropriate control signals to a memory so that the memory is properly controlled to enable and disable its storage circuits.
While some DRAM controllers are directed to certain efficiencies of memory access, it has been observed in connection with the present inventive embodiments that some limitations arise under current technology. Some of these limitations are caused by DRAM controllers which cause a large number of overhead cycles to occur, where overhead cycles represent those cycles when the DRAM is busy but is not currently receiving or transmitting data. One common approach to reduce the overall penalty caused by overhead is using burst operations. Burst operations reduce overall overhead because typically only a single address is required along with a burst size, after which successive data units (i.e., the burst) may be either read or written without additional overhead per each data unit. However, even with burst technology, it is still important to examine the amount of overhead cycles required for a given burst size. In this regard, under current technology the ratio of burst length to total access length provides one measure of efficiency. Given that measure, efficiency can be improved by increasing the burst length, that is, by providing long uninterrupted burst accesses. In other words, efficiency is considered higher because for the same number of overhead cycles there is an increase in the number of data access cycles relative to overhead cycles. However, it has been observed by the present inventors that such an approach also may present drawbacks. As one drawback, a burst of a larger number of cycles prevents access to the memory by a different requesting circuit during the burst; alternatively, if the different requesting circuit is permitted to interrupt the burst, then it typically is achieved by an interrupt which then adds overhead cycles to stop the current burst and then additional overhead to re-start the burst once the access for the different requesting circuit is complete. These drawbacks are particularly pronounced in a system which includes more than one processor (e.g., general purpose, specific processor, MPU, SCP, video controller, or the like) having access to the same DRAM.
To further illustrate the above limitations and thus by way of additional introduction, FIG. 1 illustrates a timing diagram of four accesses to a main memory via a DRAM controller, with those accesses labeled generally A1 through A4. For sake of this example, assume that accesses A1 and A3 are by a first resource R1 (e.g., a CPU), while accesses A2 and A4 are by a second resource R2 (e.g., an external peripheral). Accesses Al through A4 are examined in further detail below, with it noted at this point that FIG. 1 presents for each an example of the typical numbers of clock cycles expended in those accesses. These numbers as well as the timing of the accesses are later used to illustrate various of the benefits of the present inventive embodiments.
Access A1 represents a read burst access to the main memory where the burst is of eight words of data. The first portion of access A1 is a period of overhead, which in the example of FIG. 1 spans six cycles. This overhead is referred to in this document as leading overhead, and as known in the art includes operations such as presenting control signals including the address to be read to the main memory and awaiting the operation of the main memory in response to those signals. The second portion of access A1 is the presentation of the burst of data from the main memory. In the current example, it is assumed that the burst size is eight and that each burst quantity (e.g., 16 bits) exhausts a single cycle. Thus, the burst of eight 16-bit quantities spans a total of eight cycles. Concluding the discussion of access A1, one skilled in the art will therefore appreciate that it spans a total of 14 cycles.
Accesses A2, A3, and A4 represent a single data read, a write burst, and a single data write, respectively. Like access A1, each of accesses A2, A3, and A4 commences with some number of leading overhead cycles. Specifically, the read operation of access A2 uses six cycles of leading overhead, while each of the write operations of accesses A3 and A4 uses three cycles of leading overhead. Additionally, each of accesses A2, A3, and A4 is shown to expend a single cycle per data quantity. Thus, the single data operations of accesses A2 and A4 each consume a corresponding single cycle, while the burst operation of access A3 consumes eight cycles, with each of those eight cycles corresponding to one of the eight bursts of write data. Lastly, note that each of accesses A2, A3, and A4 also includes overhead after the data access, where this overhead is referred to in this document as ending overhead. Such overhead also may arise from various control operations, such as precharging memory rows and/or banks as well as receipt of a signal indicating the end of an access. In the present example of FIG. 1, the read operation of access A2 uses two cycles of ending overhead, the write operation of access A3 uses four cycles of ending overhead, and the write operation of access A4 uses five cycles of ending overhead.
Concluding with some observations regarding the illustration of FIG. 1 it is now instructive to examine various of its drawbacks. As a first drawback, note that a total of 47 cycles are expended for accessing only 18 data quantities. Therefore, 29 cycles arise from overhead operations and, thus, 62 percent of the cycles (i.e., 29/47=0.62) relate to overhead leaving only 38 percent of the cycles (i.e., 18/47=0.38) for actual data access. As another consideration to the FIG. 1 approach, note that a gap between accesses A3 and A4 occurs, which for example may arise when there is a sufficient gap between the requests giving rise to accesses A3 and A4. When such a gap arises, there are yet additional latency clock cycles expended as mere wait time, shown as 8 cycles by way of example in FIG. 1. During that time, there is no use of the bandwidth for access to data. In addition, after the wait time, there is additional latency at the beginning of access A4 when the DRAM controller once again submits the leading overhead for access A4. Given the above, one skilled in the art will appreciate that these factors as well as others contribute to and increase the average time for accessing data (i.e., latency) and degrade overall system performance.
By way of further background, some system latency has been addressed in the art by using DMA. DMA enables peripherals or coprocessors to access memory without heavy usage of resources of processors to perform the data transfer. A traffic controller groups and sequences DMA accesses as well as direct processor accesses. More particularly, other peripherals may submit requests for access to the traffic controller and, provided a request is granted by the controller, are given access to the main memory via a DMA channel. Additionally, the CPU also may have access to the main memory via a channel provided via the traffic controller and separate from DMA. In any case, the DMA approach typically provides an access channel to memory so that multiple devices may have access to the memory via DMA.
While DMA has therefore provided improved performance in various contexts, the present inventors have also recognized that it does not address the drawbacks of the memory controller described in connection with FIG. 1. In addition, the present inventive scope includes considerations of priority which may be used in connection with DMA and traffic control, and which improve system performance both alone and further in combination with an improved memory controller.
In view of the above, there arises a need to address the drawbacks of the prior art and provide improved memory control and access traffic control for reducing memory access latency.