The present invention relates to the field of cache snooping in a multiprocessor environment, and more particularly to reducing the latency of a snoop tenure.
A multiprocessor system may comprise multiple processors coupled to a common shared system memory. Each processor may comprise one or more levels of cache memory. The multiprocessor system may further comprise a system bus coupling the processing elements to each other and to the system memory. A cache memory may refer to a relatively small, high-speed memory that contains a copy of information from one or more portions of the system memory. Frequently, the cache memory is physically distinct from the system memory. Such a cache memory may be integral with a processor in the system, commonly referred to as a Level-1 (L1), or primary, cache, or may be non-integral with a processor in the system, commonly referred to as a Level 2 (L2), or secondary, cache.
When a processor generates a read request and the requested data resides in its cache memory, e.g., L1 cache, then a cache read hit takes place. The processor may then obtain the data from the cache memory without having to access the system memory. If the data is not in the cache memory, then a cache read miss occurs. The memory request may be forwarded to the system memory and the data may subsequently be retrieved from the system memory as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from the system memory may be provided to the processor and may also be written into the cache memory due to the statistical likelihood that this data will be requested again by that processor. Likewise, if a processor generates a write request, the write data may be written to the cache memory without having to access the system memory over the system bus.
Hence, data may be stored in multiple locations, e.g., L1 cache of a particular processor and system memory. If a processor altered the contents of a system memory location that is duplicated in its cache memory, the cache memory may be said to hold “modified” data. The system memory may be said to hold “stale” or invalid data. Problems may result if another processor or bus agent, e.g., Direct Memory Access (DMA) controller, inadvertently obtained this “stale” or invalid data from system memory. Subsequently, it is required that processors or other bus agents are provided the most recent copy of data from either the system memory or cache memory where the data resides. This may commonly be referred to as “maintaining cache coherency.” In order to maintain cache coherency, therefore, it may be necessary to monitor the system bus when the processor or other bus agent does not control the bus to see if another processor or bus agent accesses cacheable system memory. This method of monitoring the system bus is referred to in the art as “snooping.”
Each cache may be associated with logic circuitry commonly referred to as a “snoop controller” configured to monitor the system bus for the snoopable addresses requested by a processor or other bus agent. Snoopable addresses may refer to the addresses requested by the processor or bus agent that are to be snooped by snoop controllers on the system bus. Snoop controllers may snoop these snoopable addresses to determine if copies of the snoopable addresses requested by the processor or bus agent are within their associated cache memories using a protocol commonly referred to as Modified, Exclusive, Shared and Invalid (MESI). In the MESI protocol, an indication of a coherency state is stored in association with each unit of storage in the cache memory. This unit of storage may commonly be referred to as a “coherency granule.” A “cache line” may be the size of a coherency granule. In the MESI protocol, the indication of the coherency state for each coherency granule in the cache memory may be stored in a cache state directory in the cache subsystem. Each coherency granule may have one of four coherency states: modified (M), exclusive (E), shared (S), or invalid (I), which may be indicated by two or more bits in the cache state directory. The modified state indicates that a coherency granule is valid only in the cache memory containing the modified or updated coherency granule and that the value of the updated coherency granule has not been written to system memory. When a coherency granule is indicated as exclusive, the coherency granule is resident in only the cache memory having the coherency granule in the exclusive state. However, the data in the exclusive state is consistent with system memory. If a coherency granule is marked as shared, the coherency granule is resident in the associated cache memory and may be in one or more cache memories in addition to the system memory. If the coherency granule is marked as shared, all of the copies of the coherency granule in all the cache memories so marked are consistent with the system memory. Finally, the invalid state may indicate that the data and the address tag associated with the coherency granule are both invalid and thus are not contained within that cache memory.
A processor or other bus agent may generate a “transfer request” to be received by a unit commonly referred to as a “bus macro”. A “transfer request” may refer to either a request to read an address not within the processor's or bus agent's associated cache memory(ies), a request to write to an address not exclusively owned by the processor's or bus agent's associated cache memory(ies), synchronization commands, address only requests, e.g., updating the state of a coherency granule, or translation lookaside buffer invalidation requests. The bus macro may be configured to determine if the received transfer request is snoopable. That is, the bus macro may be configured to determine if the received transfer request is to be broadcasted to the other snoop controllers not associated with the requesting processor or bus agent in order to determine if a copy of the requested snoopable address, i.e., a copy of the requested coherency granule, is within their associated cache memories. The broadcasted transfer request may commonly be referred to as a “snoop request.”
Based on the resulting responses from each of the snoop controllers, the bus macro may decide the proper action to take on the snoop request. For example, if the snoop request was a request to read from an address with an intent-to-modify and a response to the snoop request was a “hit” to the modified line, i.e., a snoop controller may have detected that the state of the requested coherency granule was in the modified state, then the bus macro may wait for the responding snoop controller to write-out (referred to as “castout” or “push”) the line before reading the requested information from system memory. If, however, each response to the snoop request was a “hit” to a shared line or the line is invalidated, then the bus macro may be free to read the requested information from system memory.
The snoop request may be said to be completed upon the bus macro receiving a signal indicating completion of the snoop operation from each of the snoop controllers thereby being able to complete the transaction. That is, upon receiving the signal indicating completion of the snoop operation from each of the snoop controllers, the bus macro may be able to service the processor's or other bus agent's transfer request, e.g., read from or write to system memory. The duration of time from broadcasting the snoop request until completing the snoop request may be referred to as a “snoop tenure”.
Since the bus macro must wait until each snoop controller snoops its associated cache contents and responds prior to servicing the processor's or bus agent's transfer request, there is a latency associated with servicing the processor's or bus agent's transfer request. By reducing the latency of the snoop tenure, i.e., by reducing the latency associated with servicing the processor's or bus agent's transfer request, bus performance may be improved.
Therefore, there is a need in the art to reduce the latency of snoop tenures thereby improving bus performance.