1. Technical Field
The present invention relates generally to computer devices and in particular to memory subsystems. Still more particularly, the present invention relates to a method and system for efficiently completing write operations within memory subsystems.
2. Description of Related Art
Improvements in computer memory subsystems continue to be one of the major developments that enable more efficient and faster computer systems. A historical perspective of the evolution of computer memory subsystems is provided in commonly assigned and co-pending patent application, Ser. No. 10/903,178, and its related applications, relevant content of which is incorporated herein by reference.
As recited in that application, computer memory subsystems have evolved from point-to-point bus topology of the early 1980's (e.g., U.S. Pat. No. 4,475,194) to more recent computer memory subsystems, which includes up to four registered dual inline memory modules (DIMMs) on a traditional multi-drop stub bus channel (e.g., U.S. Pat. No. 6,510,100). This latter structure has inherent limits on the number of modules that may be attached to the stub bus due to the increasing data rate of the information transferred over the bus. FIGS. 2A and 2B illustrate prior art memory subsystems configured with multiple DIMMs 206 connected to a memory controller 201 via a stub bus topology. As shown, all memory modules (DIMMs) 206 connect directly to a single system address/command bus and a single system bi-directional data bus.
Further development led to the introduction of the daisy chain topology (U.S. Pat. No. 4,723,120), which provides point-to-point interfaces to separate devices. FIG. 1 illustrates a prior art daisy-chained memory channel, implemented consistent with the teachings in U.S. Pat. No. 4,723,120. According to the configuration, memory controller 101 is connected to a memory channel 115, which further connects to a series of memory modules 106a-n. Each module 106a-n includes a DRAM 111a-n and a buffer 120a-n. The information on memory channel 115 is re-driven by the buffer 120a on module 106a to the next module 106b, which further re-drives the channel 115 to module positions denoted as 106n. Within conventional systems, each memory module is a dynamic inline memory module (DIMM).
Read Operations
One drawback to the use of a daisy chain bus topology is increased latency associated with the return of read data via the series of daisy chained memory modules. Because each module in the channel has a different number of intervening stages to return data to the memory controller, each module has different latency for returning data to the memory controller. The variations in latencies among memory modules present a management problem for the memory controller, particularly since collisions on the memory channel have to be avoided.
One solution presented for handling these varying latencies associated with the memory modules involves leveling the read data latency of all the modules by setting the latency of modules closer to the memory controller (i.e., those with shorter latencies) equal to the latency of the module that is furthest away from the memory controller in the chain. Leveling the data return latency in this manner can be achieved by adding a fixed amount of delay to the return of read data based on the data's location in the channel. In this way, the memory controller will receive all read data with the same latency following the issuance of the read request/command, regardless of the location of the target memory module within the chain.
Additional cycles of delay were thus added to each of the closer memory modules and these delays were coded into the buffer logic of the memory module. The buffer logic is then used to delay the placement of the requested data on the memory channel for the preset number of cycles to allow for equal return data latencies.
Forcing all read operations to complete at the worst-case latency severely limits the efficiency of the memory subsystem and adds unnecessary delays in the data retrieval process. Further, with the prior art implementation, read requests must be issued at fixed times to line up with openings in the returning data stream. This requirement can result is unused data cycles on the read data channel when there is a conflict between two commands that need to be issued on the address bus. The combination of these two requirements limits the efficiency of the memory channel by adding unnecessary latency and idle cycles on the memory data channel.
One advantage of the daisy chained implementation is that each memory module installed on the data channel has an independent data bus to its DRAMs. Although all the memory modules share the same data channel back to the memory controller, they individually have a separate data bus that is isolated from the memory channel by the buffer chip. Data bandwidth in a DRAM memory system is affected by a number of DRAM architecture requirements, and the data bus bandwidth generally falls well short of the maximum available bandwidth of the common data bus, in this case the daisy chained memory channel. Therefore having multiple independent data buses driving a single memory channel may significantly increase the data bandwidth available in the system.
In the prior art implementations of the daisy channel memory system the requirement that all memory modules return data with the latency of the last memory module in the chain effectively results in a configuration where all the memory module data buses run as if they were one bus. This reduces the available bandwidth in the system back to that provided by the traditional multi-drop stub bus configurations and results in inefficient usage of the data bandwidth on the daisy chained memory channel.
Write Operations
Handling write operations also presents a management issue for the memory controller. Similar to reads, write operations are received in time order at the memory controller and are often forwarded by the memory controller to the target memory module at a first free/available cycles on the shared read/write address bus and data bus. In conventional memory subsystems, read operations are given higher priority by the memory controller since the read data is needed for current processing, while the write data is merely being archived following processing of the data. Thus, issuing a write operation utilizes bus bandwidth that may be allocated to a read that is waiting to be issued.
Write operations and read operations share a common address/control bus on the memory channel between the memory controller and memory modules, and on the memory module they share a common address and data bus (i.e., between the memory module's control logic and memory devices, e.g., DRAMs). For the memory controller to issue a write to a memory module, two factors have to be accounted for. The first factor is the availability of the system's address/control bus to the memory module. The second factor is the availability of the memory module's data and address buses.
With the first and second factors, memory modules that are busy processing reads are considered busy and cannot be written to. Each individual write operation is thus held (prevented from executing) until read operations are no longer busying the memory module. If a new read is sent out to a particular memory module as the previous read completes, the individual write operation is made to wait indefinitely until the memory module is not the target of a next read operation.
With the second factor above, since both write and read operations are completed via a single bi-directional data bus within the memory module, switching from a read operation to a write operation, and vice-versa, requires a reconfiguration of the bi-directional memory bus to allow the data to be transmitted towards the DRAM (for writes) and from the DRAM (for reads). The reconfiguration process takes several clock cycles to complete and injects a large performance penalty associated with the latency of completing read operations that follow a write operation to the same memory module. Additionally, it is common for multiple writes to be received at the memory controller, each targeting the same memory module. In conventional systems, each write is processed as an individual write operation, leading to a huge performance penalty for the single memory module targeted by multiple writes interjected between the read operations at that memory module.
Since completing write operations are of less priority than completing reads, individual completion of multiple write operations to the same memory module negatively affects the overall efficiency of the memory module in providing read data. The present invention thus recognizes that it would be desirable to enable completion of writes in the background when their target memory modules are idle. The invention further recognizes that it would be desirable to hide the inefficiencies of the DRAM architecture due to the busy time incurred after each operation. Finally, the invention recognizes the desirability of reducing the performance penalty associated with individually completing each of multiple write operations interjected between reads targeting the same memory module.