The present invention relates to computer systems and, more particularly, to memory controllers for computer systems. A major objective of the invention is to enhance overall performance of multi-master computer systems by avoiding some latencies incurred due to differences in optimal burst lengths among masters.
Much of modern progress is associated with advances in computer technology that have provided increasing speed and functionality. These advances have occurred both on the level of individual integrated circuits and on the systems integration level. Integrated circuits have become faster and have accommodated more functions per circuit. Systems have provided for increasing parallelism in the utilization of integrated circuits, as well as more efficient communication among integrated circuits.
A basic computer system includes a data processor for manipulating data in accordance with program instructions; both the data and the instructions can be stored in a memory system. There can be several levels of memory. Main memory is typically some form of random access memory (RAM) residing on a different integrated circuit than the processor resides. Typically, a computer has one or more bulk storage memoriesxe2x80x94usual disk-based serial access memories such as floppy disks, hard disks, CD-ROMs, etc. The capacity of the bulk storage devices typically exceeds that of main memory, but the access times are much slower. Thus, when a program is to be executed, the required instructions and the required data are loaded from the bulk storage into main memory for faster execution.
While main memory is much faster than bulk memory, accessing main memory tends to be a bottleneck from the perspective of the processor. A typical read cycle, for example, involves the processor asserting an address, selecting memory or other device associated with that address, reception and decoding of the address by the memory, and, finally, access and transmission of the contents at the addressed location to the processor. Such a read operation can consume several processing cycles.
Write operations, in which the processor writes data to memory, can be faster since the processor can transmit the data at the same time the address is transmitted. Thus, while, both read and write operations between a processor and main memory can limit processor throughput, the emphasis herein is on the relatively more time-consuming read operations.
Caches reduce the delays involved in main memory accesses by storing data and instructions likely to be requested by a processor in a relatively small and fast memory. There can be multiple levels of cache, e.g., a smaller, faster, level-one (L1) cache and a larger, slower, level-two (L2) cache. A typical read operation involves transmitting a read request to the L1 cache, the L2 cache, and main memory concurrently. If the requested data is found in the L1 cache, the processor""s request is satisfied from the L1 cache and the accesses of the L2 cache and main memory are aborted. If the data is not found in the L1 cache, but is found in the L2-cache, the data is provided by the L2 cache and the access of main memory is aborted. If the requested data is not in either cache, the request is fulfilled by main memory.
An L2 cache typically controls requests by a processor targeted for main memory. The L2 cache typically converts a request for data at a single address location to a request for data at a series of, e.g., four, address locations. The cache stores the requested data along with neighboring data on the assumption that the processor is relatively likely to recall previously requested data or to request data stored near previously requested data.
While the presence of a cache improves the availability of data to the processor, the longer access times associated with the fetching of lines including uncached data limit performance. If a cache controller has to send multiple, e.g., four, addresses, for each line to be cached, the four associated access cycles can be a burden to performance. In particular, there can be an access latency associated with each main memory access so that each line access would involve multiples of such latencies.
Modern xe2x80x9csynchronous dynamic random-access memoriesxe2x80x9d (SDRAMs) typically employ two features designed to minimize the compounding of access latencies. The first feature is pipelined processing in which a read request can be received while a previous read request is being processed. With pipelining there is typically a latency of two or more system-bus cycles associated with the first access, but subsequent sequential accesses do not add to that latency beyond a typical baseline of one system-bus cycle per address.
If the system bus is also pipelined, the master (e.g., the processor/cache system) can send four addresses in quick succession and receive the requested data without an inter-request delay. However, many system buses and many processors are not designed take full advantage of memory pipelining. When the bus is not pipelined and often even when it is, a master must wait until one request is fulfilled before issuing the next request.
To take advantage of a pipelined memory despite limitations in the system bus or processor, SDRAMs can provide for multi-address burst modes. In such a mode, an SDRAM provides the contents not only of the requested address but also of succeeding addresses. For example, in a burst-4 mode, an SDRAM provides the data at the requested address and the data at the next three consecutive addresses.
In principle, by setting the burst length equal to the cache line length, a cache could receive a complete line in response to a single address request. However, many systems provide for exceptional circumstances (e.g., a xe2x80x9cnon-cacheable readxe2x80x9d instruction) in which only one address is to be read. If the system cannot tolerate unrequested data on the system bus, then burst-4 mode is problematic. The burst-1 mode avoids this problem, but introduces multi-cycle latencies in single-cache-line fetches.
U.S. Pat. No. 5,802,597 to Nelsen, xe2x80x9cNelsenxe2x80x9d herein, discloses a system that provides for single address accesses while a memory is in burst-4 mode. The memory controller forwards the first address to the memoryxe2x80x94which then begins the burst. When the data from the first address is received by the master (the processor/cache combination), the second address can be asserted. If the second address is asserted (confirming the corresponding address as generated in the burst), the burst is allowed to continue. If the second address is not asserted (disconfirming the second address as generated in the burst), the burst is aborted.
To effect such an abort, the connection between the memory and the system bus can be broken and the system bus tri-stated. The memory pipeline can be cleared and the memory outputs can be cleared. This abort procedure can consume a cycle or two. Depending on the situation, this abort delay might or might not affect performance. In the worst case, if a write operation I s asserted right after the read operation, the write operation could suffer a latency corresponding to that imposed by the abort. However, this cost can be more than offset where the single-address accesses are infrequent relative to the four-address accesses.
The optimal burst length depends on the master. For example, the optimal burst length can be four for a master with a four-word-wide cache, while the optimal burst length can be eight for a master with an eight-word-wide cache. Systems with multiple masters having different optimal burst lengths can provide for changing the burst mode to match the current master.
Typically, changing the burst mode involves executing a write instruction, e.g., part of a driver program or subroutine, to write a burst value in a burst-mode register of the SDRAM memory. Thus, changing the burst mode can involve calling a subroutine as well as executing the included burst-value write instruction. A burst mode switch can consume several bus cycles. If masters are changed infrequently, the associated latency can be negligible when averaged over time. However, in modern systems in which multiple masters are rapidly time-multiplexed to simulate concurrency, the latencies involved in changing burst modes can be significant.
If the burst mode is not changed when a different master is selected, then memory accesses can be non-optimal for at least one of the masters. If the burst length is too short, multiple bursts are required and inter-burst latencies are incurred. If the burst length is too long, abort latencies are incurred on a regular basis. Thus, depending on the implementation, a multi-master system with a non-pipelined bus incurs penalties due to 1) burst mode changes, 2) interburst latencies, and/or 3) abort latencies. What is needed is a system in which such latencies are further reduced.
The present invention provides for automated burst-mode changes to minimize latencies due to changes in burst mode. Information regarding the preferred burst mode for each system bus master can be stored, e.g., in a look-up table. The master-grant signal used to select the current master can be used to select the burst mode preferred for the selected master.
Since standard memory modules are not provided with programmable burst-mode tables or with means for detecting master-grant signals, the present invention provides for these capabilities to be built into a memory controller. The controller assumes the burst-mode function for the memory system by generating the required addresses at a rate designed to keep the memory pipeline full. In this case, the burst mode of the memory system changes, even though the burst mode of the memory itself does not.
For example, the memory controller can respond to a master with a four-word-wide cache by generating and transmitting to memory three successor addresses after the requested address is forwarded to memory in burst-1 mode. When bus control is switched to a master with an eight-word-wide cache, the memory controller follows the requested address with seven successor addresses generated by the controller.
If the memory is in burst-1 mode, an address is sent from the controller to the memory every bus cycle. To reduce power consumption, the memory can be put in a multiple-address burst mode. For example, if the memory is in burst-4 mode, the memory controller can generate and transmit every fourth address every fourth cycle. In the case of the master with a four-word-wide cache, only the requested address is forwarded; the remaining three addresses are automatically generated by the memory. In the case of the master with an eight-word-wide cache, one successor address (equal to the original request plus four) is generated and transmitted four cycles after the requested address is forwarded to the memory.
In general, the preferred burst mode would correspond to the largest common factor of the cache widths of the available masters. Special cases involve streaming masters, masters with variable preferred burst lengths, and infrequently used masters preferring short burst lengths. In each of these cases, the memory burst length selection involves tradeoffsxe2x80x94however, the tradeoffs are generally less costly than operation without the invention would be.
The invention thus provides that the memory-system burst length can be any integer multiple of the fixed memory burst length. In one realization, the memory controller is programmed with the memory burst length. Each master is then assigned a multiple of that burst length. When a newly selected master issues a read that is to be satisfied from main memory, the multiplier is the number of consecutive memory bursts constituting the system-memory burst.
To this end, the controller can include two counters. One counter is a memory-burst counter and mirrors the burst activity in the memory. The second counter is the multiplier counter, which is clocked by the burst counter. The multiplier counter counts the number of bursts as they are generated. When the number of memory bursts equals the multiplier number, the controller halts the bursting. If at any time during the series of bursts, an address speculatively generated is not confirmed, the bursting is aborted.
The present invention permits memory-system burst mode changes with minimal or negligible latency penalties. Even though the burst mode of the memory itself is not changed, inter-burst latencies are avoided for masters preferring bursts longer than the set memory burst length. Thus, performance in multi-master systems using a non-pipelined bus can be enhanced. By optimally selecting the burst mode, power requirements can be minimized.
The inventive approach is preferable to utilizing long bursts that must be aborted frequently. Typically, when a write operation follows a burst, the invention allows the burst to be completed before termination. Thus avoided are latencies involved in clearing an aborted burst before a write operation can be executed. These and other features and advantages of the invention are apparent from the description below with reference to the following drawings.