1. Field of the Invention
The invention relates to an improved system and method for performing address conflict detection in a multi-processor system in which multiple processors make memory requests to a single shared cache memory; and, more particularly, to a system and method for performing address conflict detection and resolution on the memory request address signals using address indirection.
2. Description of the Prior Art
Many data processing systems have multiple processors coupled to the same shared memory. Such systems must employ coherency schemes, as are well known in the art. Coherency schemes ensure that updates made to memory because of the activities of one processor are made available to the other coupled processors so that all processors in the system are working from the same (latest) copy of the signals stored in memory.
Special memory coherency considerations must be made in systems in which multiple processors are coupled to a shared cache memory, which is, in turn, coupled to another memory at a next lower level in the memory hierarchy. This "lower-level" memory could be another level of cache or a main memory, but for convenience sake, it will be referred to as a "main memory". In such systems, a processor generally makes memory requests to the shared cache memory first. If a cache hit results, the requesting processor is provided access to the cache. If a cache miss results, that request is queued along with other requests that resulted in cache misses. The queued request is then presented to the main memory according to a predetermined priority scheme.
Sometimes the above-described procedure can not be followed because an "address conflict" occurs. An address conflict occurs when two requests are made to the same address within some close proximity in time. The first request is handled according to the general procedure, but the "effects" of this first request on memory are not completed by the time the second request is selected for processing. The second request must be delayed to allow the first request to complete. Otherwise, the memory may return an old copy of the data signals, or may return data signals which are partly old, and partly new, that is, the data signals are inconsistent. Both of these cases cause improper system operation. This concept can best be explained by example. Assume a first processor makes a write request to an address in a shared cache memory at approximately the same time a second processor makes a read request to the same address in the cache memory. Based on a predetermined priority scheme, the first processor's request is handled first and results in a cache miss. The write request address that resulted in a cache miss is therefore entered in a memory queue so that the request can be presented to main memory and the requested data signals may be returned to the cache. While the first write request is still queued to main memory, the second processor's read request is presented to the cache. Again, the address results in a cache miss, and the second read request address is added to the queue to wait for main memory access. When the first write request is presented to main memory and the requested data signals are returned from main memory and written to the cache, a replacement operation occurs. The write operation is then completed when updates are written to the replaced data signals in cache.
When the second read request address is retrieved from the queue for processing, it would generally be presented to main memory. However, assuming the cache in this case is not a store-through cache, and updates are therefore not copied back to main memory until a flush operation occurs, presenting the read request address to main memory would result in retrieval of an older copy of the data which will overwrite the newer copy in cache. Therefore, the read request address should be presented to the cache memory instead of to the main memory. Detecting and properly handling this type of situation is called conflict detection and resolution.
The above example illustrates perhaps the most simple example of an address conflict. Cache replacement and cache flush operations can also give rise to address conflict situations. A cache replacement is the operation that occurs when data signals from the main memory are copied back to the cache after a cache miss results. A cache flush is the operation that occurs prior to the replacement operation to make room for the data signals which will be written to cache during the replacement operation. During a cache flush operation, any updated data signals which will be overwritten during the replacement operation must be copied back to the main memory so the main memory has the latest copy of these data signals. The address in main memory from which the data signals are retrieved prior to a replacement operation is called the replacement address, and the address to which the data signals will be copied in main memory during a flush operation is called the flush address.
Both replacement and flush operations can be delayed in a manner to be discussed further below. As a result, a cache flush or replacement operation associated with a previous request may not have completed when a subsequent unrelated request to either the replacement or flush address is presented to cache. To maintain cache coherency in this situation, the new request must be marked as a conflict regardless of whether a cache hit or miss occurs.
Finally, conflict situations can arise in high-speed cache systems which allow requests to be interleaved in a manner to be described further below. In such systems, a first cache write request interleaved with a second cache read request to the same address can result in incoherent results. If the first cache write operation is not allowed to complete, the interleaved cache read request will receive "old" data signals. To prevent this situation, the subsequent cache read request must be marked as a conflict so that it is not presented to cache immediately.
In all of the above situations, cache coherency is maintained through the use of a conflict detection and resolution system. Prior art conflict detection and resolution schemes manipulate queued addresses to determine the presence of a conflict. When a request to cache results in a cache miss, the full address associated with the request is added to a memory queue to await presentation to the main memory. After the address is added to the queue, the address is compared to the addresses of all other requests that are also present in the queue. If a favorable compare exists, the newly queued address is tagged as a conflict. When finally removed from the queue for processing, addresses that are tagged as conflicts are re-directed back to the cache instead of being presented to main memory.
Prior art conflict detection systems have several disadvantages. First, they are relatively intensive in the use of circuitry (logic) because full request addresses are entered in the memory queue to await presentation to main memory. In large data processing systems, memory addresses may include thirty or more bits. Therefore, the large number of logic circuits needed to implement the memory queue consumes large area of silicon on a custom or semi-custom integrated logic device.
Another disadvantage associated with prior art conflict detection systems involves the use of partial-address compares where the conflict detection is not performed on all bits of the address. When a new request address is added to the memory queue, only a predetermined subset of address bits from the request address is compared with a similar subset of the address bits associated with every other entry in the queue. If a favorable compare results, the newly-entered request address is marked as a conflict. The use of partial-address compares allows the conflict detection logic to be implemented using fewer logic circuits and results in use of less silicon area. However, it can result in two different request addresses being detected as "conflicts" when, in fact, only the predetermined subset of address bits actually compare. As a result, when the "conflicting" request address is removed from the queue for processing, it will be erroneously presented again to the cache where it will again result in a cache miss. The request address must be re-entered into the memory queue to await presentation to the memory interface. This detection of "false", or "noise", conflicts decreases system throughput.
Finally, in prior art large-scale data processing systems, queue size limits system throughput. As discussed above, the memory queues are "silicon intensive" because each queue entry must store an entire memory address. In the interest of limiting the silicon area occupied by the memory queues, the number of entries within the queue are limited to a number which is considerably less than that required to store addresses for all of the memory requests that could potentially be pending simultaneously. When the queue is full, no more requests can be presented to cache because if a cache miss occurs, the queue can not store the additional request. Thus, even though some requesters do not have requests pending to main memory, and these requesters are making requests that would result in cache hits, these requests can not be processed. When the memory queue is full, all memory access to the cache will cease. This significantly diminishes system throughput.
The conflict detection and resolution system of the present invention solves the above-identified problems. The system takes advantage of the fact that each request address must be stored somewhere before it can be presented to cache. Assuming a requesting processor and the cache are implemented in separate integrated circuit chips, the request addresses are generally stored in input buffer circuits. Therefore, a copy of a request address need not be stored in the memory queue following a cache miss. Instead, a pointer to the associated input buffer circuit can be stored in the memory queue. This dramatically reduces the amount of silicon area consumed by the queue logic. Because the pointers are so much smaller than each of the memory addresses, the number of queue entries can be increased so that one queue entry is provided for every request that could potentially be pending to main memory simultaneously. This allows the cache to continue processing requests as long as there are some requesters which do not yet have requests queued and waiting for main memory access. Finally, the current invention provides full address compare capability so that "noise" conflicts are eliminated. The improved conflict detection and resolution system provides conflict detection for each of the conflict situations described above while improving overall system throughput.