1. Field of the Invention
This invention relates generally to a system for managing processor requests made to a shared main memory system that utilizes a directory-based cache coherency scheme; and, more specifically, to a system that utilizes information associated with previously deferred memory requests to determine that certain subsequently-received memory requests should also be temporarily deferred such that redundant memory coherency actions are prevented from being unnecessarily initiated, and so that memory operation is optimized.
2. Description of the Prior Art
Data processing systems are becoming increasing complex. Some systems, such as Symmetric Multi-Processor (SMP) computer systems, couple two or more Instruction Processors (IPs) and multiple Input/Output (I/O) Modules to shared memory systems. This allows the multiple IPs to operate simultaneously on the same task, and also allows multiple tasks to be performed at the same time to increase system throughput.
As the number of units coupled to a shared memory increases, more demands are placed on the memory and memory latency increases. To address this problem, high-speed cache memory systems are often coupled to one or more of the IPs for storing data signals that are copied from main memory. These cache memories are generally capable of processing requests faster than the main memory while also serving to reduce the number of requests that the main memory must handle. This increases system throughput.
While the use of cache memories increases system throughput, it causes other design challenges. When multiple cache memories are coupled to a single main memory for the purpose of temporarily storing data signals, some system must be utilized to ensure that all IPs are working from the same (most recent) copy of the data. For example, if a copy of a data item is stored, and subsequently modified, in a cache memory, another IP requesting access to the same data item must be prevented from using the older copy of the data item stored either in main memory or the requesting IP""s cache. This is referred to as maintaining cache coherency. Maintaining cache coherency becomes more difficult as more caches are added to the system and more copies of a single data item must be managed.
Many methods exist to maintain cache coherency. Some earlier systems achieve coherency by implementing memory locks. That is, if an updated copy of data existed within a local cache, other processors are prohibited from obtaining a copy of the data from main memory until the updated copy was returned to main memory, thereby releasing the lock. For complex systems, the additional hardware and/or operating time required for setting and releasing the locks within main memory cannot be justified. Furthermore, reliance on such locks directly prohibits certain types of applications such as parallel processing.
Another method of maintaining cache coherency is shown in U.S. Pat. No. 4,843,542 issued to Dashiell et al., and in U.S. Pat. No. 4,755,930 issued to Wilson, Jr. et al. These patents each discuss a system wherein a processor having a local cache is coupled to a shared memory through a common memory bus. Each processor is responsible for monitoring, or xe2x80x9csnoopingxe2x80x9d, the common bus to maintain coherency of its own cache data. These snooping protocols increase processor overhead, and are unworkable in hierarchical memory configurations that do not have a common bus structure.
A similar snooping protocol is shown in U.S. Pat. No. 5,025,365 to Mathur et al., which teaches local caches that monitor a system bus for the occurrence of memory accesses which would invalidate a local copy of data. The Mathur snooping protocol removes some of overhead associated with snooping by invalidating data within the local caches at times when data accesses are not occurring, however the Mathur system is still unworkable in memory systems without a common bus structure.
Another method of maintaining cache coherency is shown in U.S. Pat. No. 5,423,016 to Tsuchiya. The method described in this patent involves providing a memory structure called a xe2x80x9cduplicate tagxe2x80x9d with each cache memory. The duplicate tags record which data items are stored within the associated cache. When a data item is modified by a processor, an invalidation request is routed to all of the other duplicate tags in the system. The duplicate tags are searched for the address of the referenced data item. If found, the data item is marked as invalid in the other caches. Such an approach is impractical for distributed systems having many caches interconnected in a hierarchical fashion because the time requited to route the invalidation requests poses an undue overhead.
For distributed systems having hierarchical memory structures, a directory-based coherency system becomes more practical. Directory-based coherency systems utilize a centralized directory to record the location and the status of data as it exists throughout the system. For example, the directory records which caches have a copy of the data, and further records if any of the caches are allowed to have an updated copy of the data. When a cache makes a request to main memory for a data item, the central directory is consulted to determine where the most recent copy of that data item resides. Based on this information, the most recent copy of the data is retrieved so it may be provided to the requesting cache. The central directory is then updated to reflect the new status for that unit of memory. A novel directory-based cache coherency system for use with multiple Instruction Processors coupled to a hierarchical cache structure is described in the co-pending application entitled xe2x80x9cDirectory-Based Cache Coherency System Supporting Multiple Instruction Processor and Input/Output Cachesxe2x80x9d, Ser. No. 09/001,598 filed Dec. 31, 1997, which is incorporated herein by reference in its entirety.
As stated above, a main memory employing a directory-based coherency system is a practical way to maintain coherency within a hierarchical memory that includes multiple levels of cache. Moreover, this type of coherency system may be readily expanded to maintain coherency among a large number of cache memories. One problem with this type of coherency scheme, however, is that as the number of cache memories within the system increases, a larger percentage of the main memory bandwidth is consumed in the handling and management of various memory coherency actions. For example, a first processor may have the latest cached copy of a data item requested by the second processor. The main memory must initiate an operation to retrieve the data copy from the first processor before the request may be processed. In the mean time, a third processor may request the same data item from main memory, causing the main memory to again initiate an operation to attempt to retrieve the most recent data copy.
Not only does the initiation of coherency actions consume memory cycles, but it also requires the use of other system resources as well. The scheduling of requests for causing a processor to return data to the main memory requires the use of various queue structures within the memory control system. These requests must be processed by the memory controllers, and ultimately transferred across memory bus resources to the various cache memories. The cache memories process the requests and schedule the return of requested data to memory. This return operation again requires the use of memory bus resources.
As can be readily appreciated by the foregoing discussion, in a hierarchical memory employing a directory-based cache coherency structure, the occurrence of coherency operations decreases the rate at which the memory can process requests. The problem increases when multiple processors are grouped together to work on a single task that requires the sharing of data, If multiple processors are each requesting the use of the same data item within a short period of time, coherency actions are initiated that may significantly impact memory throughput.
The problem associated with maintaining cache coherency can be further complicated in systems that allow I/O units to overwrite main memory segments at the same time valid copies of the data are stored in local caches. Although the use of these types of I/O overwrite operations provides a mechanism for efficiently updating main memory data, it further increases the number of coherency actions that must be performed within the system. This is because coherency operations must be initiated to flush all outdated cached data copies from the caches.
What is needed is a memory that incorporates the advantages of a directory-based coherency system, but which minimizes the number of coherency actions that must be initiated when multiple processors are requesting access to the same memory data simultaneously.
The primary object of the invention is to provide an improved system for managing requests made to a shared main memory;
Another object of the invention is to provide an system for minimizing the number of coherency actions that are initiated by a shared main memory that utilizes a directory-based cache coherency scheme;
A still further object of the invention is to minimize the number of redundant memory coherency actions that must be unnecessarily processed by the caches residing within a shared main memory system employing a directory-based cache coherency scheme;
A yet further object of the invention is to minimize the number of memory requests that are deferred after being presented to a shared main memory;
A further object of the invention is to provide a system for determining when a request to a shared main memory is to be deferred without providing the request to memory;
A yet further object of the invention is to provide a request storage system for maintaining memory coherency through the use of linked lists of deferred memory requests;
A still further object of the invention is to provide a system for optimizing the performance of a shared main memory by filtering requests that are provided to the memory using information associated with other deferred memory requests; and
Another object of the invention is to provide a system for handling deferred memory requests received from both Instruction Processors and Input/Output Processors in a manner that maintains memory coherency.
The objectives of the present invention are achieved in a memory request management system for use with a memory system that employs a directory-based cache coherency scheme. The current memory system includes a main memory coupled to multiple cache memories. The main memory receives requests from each of the multiple cache memories to write data to, and fetch data from, addressable memory locations. In some cases, it is determined after a memory fetch request is presented to memory that the request can not be processed immediately because the most recent copy of the requested data is stored in another cache memory. The memory must therefor initiate retrieval of that most recent data copy before the request may be completed. During this data retrieval, the associated fetch request is stored in a temporary storage structure and identified as xe2x80x9cdeferredxe2x80x9d.
Sometimes, additional read requests are received for the same data item as was previously requested by one or more deferred requests. According to the current invention, the subsequently-received read requests are also stored in the temporary storage structure and marked as deferred without being presented to memory. In this manner, fetch requests may be deferred without having to present those requests to the main memory. This provides many advantages. Memory cycles are not wasted determining that a request can not be immediately processed. Additionally, overhead associated with initiating a redundant and unnecessary data retrieval operation is not imposed on the memory control logic. Unnecessary data retrieval requests initiated by main memory to ones of the caches are eliminated, thus conserving cycles on the memory-to-processor buses. Processing overhead is eliminated within the memory cache controllers, and unnecessary cache response cycles are eliminated on the processor-to-memory buses.
According to one aspect of the invention, when a data retrieval operation is completed, an associated request is designated as undeterred. In some instances, the returned data may be immediately provided to the requester, and the undeterred request is therefor considered complete and may be removed from the temporary storage structure. In other cases, the undeferred request is presented to the main memory for completion and then removed from the temporary storage structure. After removal of a request, any other deferred request that was requesting the same data item as the newly-completed request becomes eligible for processing.
According to another aspect of the invention, requests stored within the temporary storage structure as deferred requests and which are associated with the same requested data item are stored as a linked list of requests. The oldest request is at the front of the linked list, with subsequently-received requests being chained to the linked list in the order the requests are received. The requests are processed by main memory in a first-in, first-out manner such that the oldest requests are completed before more recently-received requests.
The memory system of the preferred embodiment further supports I/O overwrite operations wherein a peripheral device is allowed to overwrite data stored at requested addressable locations within the main memory even when some of the most recent data items associated with the overwritten memory addresses reside within ones of the cache memories. To handle the I/O overwrite operations in a manner the preserves data coherency, the I/O overwrite requests are deferred in a manner that is similar to fetch requests. Specifically, I/O overwrite requests made to an address associated with a previously deferred request are stored in the temporary storage structure and designated as deferred. In the preferred embodiment, a deferred I/O overwrite request is not processed until all older deferred requests to the same memory address have been completed.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings, wherein only the preferred embodiment of the invention is shown, simply by way of illustration of the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded to the extent of applicable law as illustrative in nature and not as restrictive.