Data merging is a process by which two subsets of a multiple bit data word or data line can be combined to form a new version of the data word or data line that incorporates both subsets. If elements of a data line are separately modified by two different agents, either agent or both agents having a cached version of the data line, then the changes made by both agents can be reconciled by merging the data. For example, in a multiprocessor computer system wherein each processor has a dedicated cache memory and one processor is modifying a cache line previously modified by another processor, cache coherency can be obtained by properly merging the cache line data. Data merging can be performed in the main system memory or, in a computer system including buffering of data being read from and written to memory, in the data buffers.
Cache memory systems are used to compensate for the relatively slow access times of dynamic random access memory ("DRAM") that is used for main system memory in many computer systems. Cache memory is typically constructed of high speed static random access memory ("SRAM") which provides random access capabilities at a much higher speed than DRAM. SRAM is typically not used for main system memory because it is considered prohibitively expensive and hence impracticable for use in the quantities required for primary system memory, typically several megabytes in contemporary computer systems. A cache memory system speeds memory access by maintaining copies of frequently accessed data read from the main system memory in a cache. When a processor or other agent requests data stored in the system memory, the cache memory is first checked to determine if the data is stored in the cache. If the requested data is stored in the cache, then the data is read from the high speed cache memory without incurring the delays inherent in accessing main system memory. If the requested data is not in cache memory, then the data is read from the main system memory and copied into the cache as well. A cache controller then updates a directory which is part of the cache memory system. This directory cross references the cache memory with the address of the corresponding data in the main system memory.
In a computer system where the memory may be modified by several agents independently, such as a multiprocessor system or a system with independent input/output and processor buses, cache coherency is a significant problem. Cache coherency refers to the problem of maintaining the same information in both the cache memory and main memory. For example, cache coherency problems frequently arise in multiprocessor systems, where each processor has a dedicated cache. When two processor caches have the same data cached (i.e., both processors requested data from the same address in memory), one or both processors may independently alter the cached data. When that occurs, cache coherency has been lost and coherency between the caches must be restored. Cache coherency is restored by rewriting the cached data to main memory, including all of the modifications made to the data in the cached memory.
Cache coherency problems also arise in computer systems where main memory can be modified by multiple independent subsystems such as input/output subsystems and/or graphics subsystems without any direct processor interaction. In order to ensure cache coherency, a mechanism is required by which processors and other agents on a processor or host bus are notified of memory modifications made by the independent subsystems so the system can respond to a loss of cache coherency.
In a computer system with a memory controller including data buffering, cache coherency can be restored by merging data from two sources in or into the memory controller data buffers. The data buffer memory in the memory controller then contains the updated data, including any changes made by either or both sources, and when the buffered data is subsequently rewritten to the main memory, the data in main memory is the same as the corresponding cached data without requiring two separate writes to main memory.
In a cached memory system that includes processors and other agents with caches, memory transactions are typically performed in units of cache lines. Although the main system memory is typically organized in discrete bytes or words, cache memory is organized in cache lines, primarily to accelerate the process of determining whether requested data is in cache memory or not. Each cache line typically includes several bytes or data words and a single cache line corresponds to several sequential locations in the main system memory. A cache line is the smallest unit of information that can be mapped by a cache and whenever a cache miss occurs (i.e., the requested data is not in the cache), an entire cache line of data must be read from the main system memory. Thus, even though agent transactions and/or input/output transactions may only involve a single byte or data word, an entire cache line is transferred to and from memory when data is modified, and it is within a cache line that data merging occurs.
Prior art methods of data merging in a memory controller data buffer have required complex circuitry and undesirably lengthy operations to accomplish data merging. For example, in one prior art system, the cache line data was first extracted from the cache memory and placed in a plurality of holding registers, the number of holding registers corresponding to the number of distinct words or minimum addressable units in the cache line. Thus, for example, a cache line of one quad word would require four separate holding registers. In addition, a fifth holding register is required for a word being merged with the cache line.
The holding register for the word being merged into the cache line was loaded through a 3:1 multiplexer circuit with inputs from the input/output subsystem, the memory subsystem, and the processor bus. All of the holding registers were then coupled to a 5:1 multiplexer. The inputs corresponding to the four holding registers (including the holding register containing the word being merged) containing coherent data were serially selected and coupled through the multiplexer to a data buffer constructed as a first in first out ("FIFO") queue and the data in the register corresponding to the modified word was ignored. Cache coherency was then obtained by transferring the buffered data stored in the FIFO queue to the main system memory which would no longer be stale when the buffered data was rewritten.
Although this prior art system provided a mechanism for data merging, it occupied a significant amount of space for the multiplexers and FIFO queues, particularly in view of the fact that FIFO queues are typically implemented with flip flops and multiple queues were required, one for memory read operations and one for memory write operations for each input/output port of the memory controller. Thus, for example, a computer system with independent processor, graphics and input/output buses would require a minimum of six FIFO buffer queues, each of which would require its own set of dedicated holding registers and multiplexer switching circuits. Moreover, the prior art system took an undesirably long time to accomplish data merging since the data has to be moved into holding registers and then sequentially loaded into the FIFO queue, a process which can only occur one word at a time. The inherent limitations of the prior art data buffering limit the number of input/output ports that can be effectively coupled to a memory controller because of the non-scalable structure of the prior art.
Accordingly, there is a need for an efficient mechanism for data merging in a memory controller that does not require extensive multiplexer circuitry, a multiplicity of holding registers, and flip-flop based FIFO queues and can scalably merge data quickly and efficiently.