The present invention relates to memory controllers in computer systems. More specifically, the present invention relates to a system memory controller having a plurality of memory controller agents, wherein each memory controller agent contains a coherency controller and a memory controller capable of processing memory transactions for a single memory line.
In early computer systems, memory controllers were relatively simple. Typically, a single processor of the computer system would issue a read or write transaction to a memory controller, and the memory controller would complete the transaction to main memory by performing the specified read or write operation. However, as the art of computer design has progressed, memory controllers have become significantly more complex. Processors typically include multiple levels of cache memories, with each cache memory storing a subset of the contents of main memory. Furthermore, many modern computer systems often have multiple processors and I/O units, with each processor and I/O unit have one or more cache memories and requiring access to main memory. A modern memory controller must be able to efficiently handle memory transactions from each processor and I/O unit, while keeping all cache memories coherent and arbitrating between separate memory transactions to the same memory line.
To better understand the challenges facing designers of modern memory controllers, first consider a cache memory. A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions of the contents of main memory which it is believed will be used in the near future by a processor or I/O unit. The main purpose of a cache memory is to shorten the time necessary to perform memory accesses, either for data or instruction fetches from memory or writes to memory. The information located in a cache memory may be accessed in much less time than information located in main memory. Thus, a processor or I/O unit with a cache memory needs to spend far less time waiting for instructions and operands to be fetched or stored.
A cache memory is made up of many cache lines of one or more words of data. Each cache line has associated with it an address tag that uniquely identifies the memory line of main memory of which the cache line is a copy. Each time the processor or I/O unit makes a memory reference, an address tag comparison is made to see if a copy of the requested data resides in the cache memory. If the desired memory line is not in the cache memory, the memory line is retrieved from main memory, stored in the cache memory as a cache line, and supplied to the processor or I/O unit.
In addition to using a cache memory to retrieve data from main memory, the processor or I/O unit may also write data into the cache memory, thereby delaying (or, in the case of successive writes to the cache memory, even possibly eliminating) the need to write the data to main memory. When the processor or I/O unit desires to write data to memory, the cache memory makes an address tag comparison to see if the memory line into which data is to be written resides in the cache memory. If the memory line exists in the cache memory and is being held as xe2x80x9cexclusivexe2x80x9d or xe2x80x9cprivatexe2x80x9d, the data is written into the cache line in the cache memory that is holding the memory line. In many systems a data xe2x80x9cdirty bitxe2x80x9d for the cache line is then set. The dirty bit indicates that data in the cache line is dirty (i.e., has been modified), and thus before the memory line is deleted from the cache memory the modified data must be written back to main memory. If the memory line into which data is to be written does not exist in the cache memory or is held as xe2x80x9csharedxe2x80x9d, the memory line must be fetched as xe2x80x9cprivatexe2x80x9d or xe2x80x9cexclusivexe2x80x9d into the cache memory, or the data must be written directly into the main memory.
A shared-memory multi-processor (MP) system has a potentially large number of processors and I/O units, with each processor and I/O unit having one or more cache memories. For simplicity, any processor, I/O unit, or other subsystem having one or more cache memories will be referred to herein as a cacheable entity.
When an access to memory is made in such an MP system, it is necessary to take steps to ensure the integrity of data accessed. For example, when a cacheable entity reads data from memory, it is important to determine whether an updated version of the data resides in the cache of another cacheable entity. If an updated version of the data exists, something must be done to ensure that the entity accesses the updated version of the data, and not the stale version currently stored in main memory. A mechanism that ensures that the updated version of the data is utilized in a memory reference is referred to herein as a cache coherency mechanism.
The most common cache coherency mechanism is typically referred to as a snoop mechanism. A snoop mechanism usually requires the cacheable entities to share a bus such that each cacheable entity can xe2x80x9csnoopxe2x80x9d the memory transactions of the other cacheable entities. However, due to electrical reasons and bandwidth concerns, only a limited number of cacheable entities can share a bus in a manner that allows transactions to be snooped. Therefore, when the number of cacheable entities in an MP system is large, snooping can no longer be effectively used for cache coherency.
The most common cache coherency mechanism for systems with a large number of cacheable entities is a directory-based cache coherency mechanism. A directory-based cache coherency mechanism typically includes a directory structure in main memory. Within the directory structure, line state information exists for each memory line within the main memory. The line state information consists of a number of bits associated with each memory line. The bits for each memory line indicate, for that memory line, the state of the memory line, such as xe2x80x9cprivatexe2x80x9d or xe2x80x9csharedxe2x80x9d, the cacheable entities, if any, that are currently holding copies of the memory line , and any other information relevant to that memory line.
When the memory line is held as xe2x80x9cprivatexe2x80x9d in a cache memory of a first cacheable entity, the memory line is not available for use by other cacheable entities until released by the first cacheable entity, and the first cacheable entity is allowed to modify the contents of that memory line. When the memory line is held as xe2x80x9csharedxe2x80x9d in the cache memories of one or more cacheable entities, the memory line is available for use by other cacheable entities as long as the other entities do not want to hold the memory line as xe2x80x9cprivatexe2x80x9d. While the line is held xe2x80x9csharedxe2x80x9d, the contents of the line are not allowed to be modified.
When a cacheable entity desires to access a memory line, a request is sent to the memory controller. The memory controller reads the line state information for the memory line to determine the current state of the requested memory line. If the line state information bits for the requested memory line indicate that the memory line is held as private in a cache of another cacheable entity, the memory line is recalled to the memory controller. Note that if the memory line is xe2x80x9cdirtyxe2x80x9d, the modified contents of the memory line must also be recalled and then provided to the requesting cacheable entity. When the memory line comes back to the memory controller, the memory controller supplies the memory line to the requester, updates the memory line""s line state information and, updates the data for the memory line in main memory if the memory line was dirty.
If the memory line is requested as private and the memory controller reads the line state information and finds the memory line is shared, the memory controller invalidates copies of the memory line in the cache memories of other cacheable entities (as indicated by the line state information) and then supplies the memory line to the requesting cacheable entity. The memory controller also tags the line state information of the memory line as private and updates the line state information to identify the cacheable entity that now owns the memory line as private.
The memory line recall/invalidate operation can take a significant amount of time. Meanwhile, new requests for the same memory line can be received by the memory controller. Retrying these new requests is complicated in large MP systems because of the need to provide fairness and prevent starvation.
One possible mechanism for providing fairness and preventing starvation is to queue new requests for a particular memory line in the form of a linked list. Once the recalled data or the invalidate acknowledgment is received, the memory controller services the requests for that memory line in the linked list in the order the requests were received. Multiple linked lists for currently active memory lines can exist simultaneously in the memory controller. Such a mechanism was described by Sorin lacobovici et al. in U.S. Pat. No. 5,995,967, which is entitled xe2x80x9cForming Linked Lists Using Content Addressable Memoryxe2x80x9d, is assigned to the same assignee as the present application, and is hereby incorporated by reference as if completely set forth herein.
Large MP computer systems often use a relatively loose ordering model when processing read and write transactions to the same memory line. Operations that require a strict ordering model, such as semaphore operations, are generally performed by obtaining private ownership of a memory line and not releasing ownership of the memory line until the operations have been performed upon the memory line contents in the desired order. Another approach is to export an instruction used to access a semaphore, such as a fetch and add instruction, to be executed at a central location, such as a memory controller.
Because the ordering of read and writes at the memory controller is relatively loose, read and write transactions may be processed in any order. As discussed above, requests to gain access to a memory line may be processed in a xe2x80x9cfirst-in first-outxe2x80x9d order to provide fairness and to prevent starvation, though this is not required. Furthermore, write operations should be processed before read operations to ensure that the read operations receive the most up-to-date data. For example, if a processor is continuously to polling a memory location to see if a flag is set, and a write operation setting the flag arrives after read operation reading the flag, it is desirable to provide the results of the write operation to the read operation. Doing so will eliminate the need to issue another read operation to poll the flag.
Similarly, the most recent write operation received for a particular memory line should invalidate any previously received write operations because the most recent write operation presumably has the currently valid copy of the contents of the memory line. Accordingly, read and write operations are preferably processed with the following ordering semantics: read operations for a particular memory line are queued up for processing in the order received, and any write operation to a memory line is processed before all read operations from the memory line, and the last write operation to a memory line invalidates any previously received write operations to the memory line.
While the above ordering semantics may be stated quite simply, they are, in fact, relatively difficult to implement. Consider that a modern memory controller can process transactions for many memory lines simultaneously, and these transactions can all be in various states of completion. One prior art method of providing the above ordering semantics is to compare each incoming read transaction to all pending write transactions. If a read transaction attempts to access the same memory line as a pending write transaction, the read transaction is stalled until the write transaction is complete. While this method provides proper ordering, it is somewhat inefficient because read operations that could be completed in theory are stalled.
Another prior art method also compares each incoming read transaction to all pending write transactions. However, if a read transaction attempts to access the same memory line as a pending write transaction, the read transaction is completed out-of-order by using the memory line contents provided in the write transactions.
Note that prior art approaches tend to view the control of coherency and the scheduling memory transactions as a centralized problem. As MP systems continue to increase in complexity, memory controllers have tended to become unduly complex, thereby lengthening the time and expense required to design, verify, and debug a particular controller design, and thereby lengthening the time-to-market.
The present invention is a memory controller having separate memory controller agents that process memory transactions in parallel. A memory controller in accordance with the present invention includes a plurality of memory controller agents, which are coupled to each other via a series of busses, an incoming memory transaction dispatch unit, and an outgoing memory dispatch unit.
Memory transactions are received from cacheable entities of a computer system at the incoming memory transaction dispatch unit via an interconnection fabric. The incoming transactions are then presented to the plurality of memory controller agents. For each incoming transaction, one of the agents will accept the transaction. Each agent is responsible for ensuring coherency and fulfilling memory transactions for a single memory line, thereby simplifying the design of the agents. If multiple memory read transactions are received for a single memory line, the memory controller agents will configure themselves into a linked list to queue up the requests.
One of the advantages provided by the present invention is that the coherency information and memory line data associated with each memory line may be cached by each agent, thereby allowing repeated requests to the same memory line to be serviced more quickly. When two or more agents are queued up to fulfill multiple memory read transactions to the same memory line, the agents cooperate by transferring the coherency information and memory line data associated with each memory line from agent to agent, thereby minimizing the need to access main memory.
Memory transactions are completed by the outgoing memory transaction completion unit, which receives the outgoing transactions from the agents and relays the transactions back to the cacheable entities via the interconnection fabric.
The present invention provides many advantages over prior art memory controllers. Because each agent caches a memory line, the present invention correctly, transparently, and efficiently handles read-after-write conflicts to the same memory line. In many prior art memory controllers, if a read transaction attempts to access the same memory line as a pending write transaction, the read transaction is stalled until the write transaction has been completed, which is inefficient. Alternatively, other prior art memory controllers maintain special write queue registers and attempt to service the read operation out-of-order, which adds significant complexity to the design of the memory controller.
The present invention also handles multiple read memory transactions to the same memory line in a fair and deterministic order. Several linked lists may be created and advanced simultaneously. By creating linked lists, the agents allow unrelated memory traffic to proceed using free agents while read-after-read conflicts to the same memory line are queued up by linking other agents together.
The memory controller agents of the present invention adapt dynamically in response to ever changing memory traffic patterns. If memory transactions are repeatedly made to the same memory lines, the agents group together to form linked lists to service these transactions, and will cooperate by exchanging cached data to minimize the need to access main memory. This is especially useful if several cacheable entities repeatedly contend for the same memory line, as is common in semaphore operations. On the other hand, if memory transactions are made to many individual memory lines, the agents will operate independently from each other and service the transactions in parallel.
Compared to prior art memory controllers capable of handling comparable volumes of memory traffic, the memory controller of the present invention is significantly easier to design and verify, thereby minimizing development costs and minimizing time to market.