The invention is generally related to cache coherence in a shared memory architecture, and in particular to response collection in a snoopy cache coherence implementation.
Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessorsxe2x80x94the xe2x80x9cbrainsxe2x80x9d of a computerxe2x80x94and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a xe2x80x9cmemory address space,xe2x80x9d representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible, e.g., static random access memory (SRAM) devices or the like. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple xe2x80x9clevelsxe2x80x9d of memories in a memory system to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM""s) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with SRAM""s or the like. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as xe2x80x9ccache linesxe2x80x9d, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor. Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a xe2x80x9ccache missxe2x80x9d occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance hit.
Another manner of increasing computer performance is to use multiple microprocessors operating in parallel with one another to perform different tasks at the same time. Often, the multiple microprocessors share at least a portion of the same memory system to permit the microprocessors to work together to perform more complex tasks. The multiple microprocessors are typically coupled to one another and to the shared memory by a system bus or other like interconnection network. By sharing the same memory system, however, a concern arises as to maintaining xe2x80x9ccoherencexe2x80x9d between the various memory levels in the shared memory system.
For example, in a given multi-processor environment, each microprocessor may have one or more dedicated cache memories that are accessible only by that microprocessor, e.g., level one (L1) data and/or instruction cache, a level two (L2) cache, and/or one or more buffers such as a line fill buffer and/or a transition buffer. Moreover, more than one microprocessor may share certain caches as well. As a result, any given memory address may be stored from time to time in any number of places in the shared memory system.
A number of different mechanisms exist for maintaining coherence within a shared memory system, including among others a directory-based coherence mechanism and a snoopy coherence mechanism. The directory-based coherence mechanism maintains a shared directory of the location of different memory addresses in the shared memory system. However, this mechanism may induce bottlenecks given that most if not all memory access requests need to access the same directory to determine the location of a given memory address.
The snoopy coherence mechanism, on the other hand, in effect distributes the determination of where a given memory address resides among multiple possible memories to snoop logic distributed among and associated with the memories themselves. As such, at least the mechanism that maintains the state information for a memory, e.g., a directory, and the associated snoop logic that updates the state information in response to a memory access request and/or returns a response to the request, is also cooperatively referred to in this context as a xe2x80x9csnooperxe2x80x9d device. Whenever a memory access request is issued by a given device on a bus, dedicated logic in those snooper devices xe2x80x9csnoopxe2x80x9d the request and determine whether the cache line for a memory address specified by the request is stored in any of the devices. Typically, if a snooper device has a valid copy of the cache line, that device outputs the cache line to the system bus for access by the requesting device. In some embodiments, xe2x80x9cinterventionxe2x80x9d is also supported, where a snooper device is able to output a cache line directly to a requesting device, e.g., by passing data through the system bus, or even bypassing the system bus altogether.
Another important aspect of the snoopy coherence mechanism, however, is that all possible sourcing devices need to know which device will be handling a memory access request, to prevent more than one device from attempting to handle the request. Yet another important aspect is that all of the snooper devices must update their status information regarding the cache line in response to fulfilling of the request. Therefore, in response to a request, each of the snooper devices must update its status information and output a response indicating the status of the cache line in the device. The responses are then collected and a single response is returned to the requesting device to inform the requesting device of the status of the information being requested.
One conventional snoopy coherence mechanism uses a MESI coherence protocol that tags information stored in a snooper device as one of four states: Modified, Exclusive, Shared, or Invalid. The modified state indicates that the requested cache line is stored in the snooper device, and that the device has the most recent copy thereofxe2x80x94i.e., all other copies, if any, are no longer valid. The Exclusive state indicates that the requested cache line is stored only in the snooper device, but has not been modified relative to the copy in the shared memory. The Shared state indicates that the requested cache line is stored in the snooper device, but that other valid copies of the cache line also exist in other devices. The Invalid state indicates that the cache line is not stored in the snooper device.
If, in response to receipt of a request, a snooper device is capable of determining the state of a cache line, the state is returned with the appropriate response. However, if for some reason the snooper device is unable to determine the state of the cache line, the snooper device typically returns a xe2x80x9cRetryxe2x80x9d response instead, indicating the failure to process the request. Reasons for returning a Retry response may include, for example, no snoop buffer being available in the snooper device, the snooper device being busy with another operation, or colliding bus transactions, among others.
The various responses from the snooper devices in a shared memory system are typically collected by snoop response collection logic to generate a prioritized snoop response signal that is returned to the requesting device. In a conventional MESI protocol snoopy coherence mechanism, Retry responses are granted the highest priority, followed in order by Modified, Shared, Exclusive and Invalid responses. Conventional collection logic waits until each snooper device returns a response, and then returns the highest priority response among all of the returned responses.
If a Retry response is returned to a requesting device, the device is required to reissue the request to attempt to obtain a non-Retry response. By prioritizing Retry responses relative to other responses, the greatest degree of correctness is ensured for the system, since a non-Retry response will not be returned until all snooper devices have had the opportunity to complete processing of a request.
Reissued requests often take longer to complete, and thus decrease system performance. Moreover, reissued requests typically introduce more traffic on the system bus, thereby further decreasing performance. Therefore, it is often desirable to minimize the number of reissued requests whenever possible to minimize the adverse impact associated with such requests.
Conventional snoopy coherence mechanisms, however, prioritize all Retry responses generated by snooper devices, irrespective of whether the requested cache line could even be stored in the snooper device causing the Retry response. In particular, it has been found that a significant percentage of Retry responses issued by snooper devices have no bearing on the ultimate handling of a request by the other snooper devices. For example, a request may be issued for a particular cache line that is stored solely in a given snooper device. Even though that snooper device may be able to immediately process the request and return a non-Retry response, the fact that another snooper device is busy and unable to process the request will result in a Retry response being returned to the requesting device, and thus reissuance of the request. The fact that it is known that the busy snooper device could not have a copy of the cache line at that time is immaterial to a conventional snoopy coherence mechanism.
Such xe2x80x9cextraneousxe2x80x9d retry responses therefore needlessly slow response time and increase system bus traffic, and thus decrease overall system performance. A significant need therefore exists for a manner of decreasing or eliminating the occurrence of extraneous retry responses in a snoopy coherence mechanism.
The invention addresses these and other problems associated with the prior art by providing a data processing system, circuit arrangement, integrated circuit device, program product, and method that disregard extraneous retry signals during the generation of a prioritized response signal from the response signals output from various snooper devices coupled to one another over a shared memory interface. In particular, it has been determined that a subset of retry signals issued by various snooper devices that snoop memory access requests do not have any bearing upon the ultimate determination of whether or not a particular memory address, or cache line therefor, is stored in any of the snooper devices. As a result, by disregarding these extraneous retry signals, such access requests may proceed without having to be reissued, thereby minimizing the time required to process such requests, and eliminating the extraneous traffic that would otherwise be present on the interface. System performance is consequently enhanced in such instances.
In certain embodiments of the invention, for example, extraneous retry signals are disregarded by prioritizing to a higher relative priority any response signal that indicates that information from a memory address, or cache line therefor, is stored in a single snooper device. In particular, when it is known that a valid copy of the information from a particular memory address can be found in a specific snooper device, the fact that another snooper device is unable to determine its state is irrelevant, since it is already known by virtue of the state of that specific snooper device that the other snooper device cannot have a valid copy of the information from the memory address. In other embodiments, response signals that indicate that the information from a requested memory address is stored in multiple snooper devices may also be prioritized relative to a retry signal if the request will not modify the information in the memory address, as the fact that another snooper device is unable to determine its state is also irrelevant in this circumstance. Other manners of disregarding extraneous retry signals will hereinafter become apparent to one of ordinary skill in the art.
Therefore, consistent with one aspect of the invention, coherence is maintained between a plurality of snooper devices coupled to one another over a shared memory interface. In response to receipt of an access request for a selected memory address, a plurality of local response signals are generated, with each local response signal associated with the state of the selected memory address in one of the snooper devices. A retry signal is generated as the local response signal for a selected snooper device when the state of the selected memory address in the selected snooper device cannot be determined. Furthermore, a prioritized response signal is generated representative of a highest priority local response signal among the plurality of local response signals, with any extraneous retry signal disregarded when the prioritized response signal is generated.
Consistent with another aspect of the invention, coherence is maintained between a plurality of snooper devices coupled to one another over a shared memory interface. In response to receipt of an access request for a selected memory address, a plurality of local response signals are generated, with each local response signal indicating one of a plurality of states for the selected memory address in an associated snooper device among the plurality of snooper devices. The plurality of states includes a non-shared state that indicates that the selected memory address is cached solely in the associated snooper device, a retry state that indicates that the snoop logic is unable to determine the state of the selected memory address in the associated snooper device, and a non-cached state that indicates that the selected memory address is not cached in the associated snooper device. Furthermore, a prioritized response signal is generated indicating a highest priority state among the states of the local response signals generated in response to the access request, with the non-shared state having a higher priority than the retry state, and the retry state having a higher priority than the non-cached state.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the Drawings, and to the accompanying descriptive matter, in which there is described exemplary embodiments of the invention.