The present invention relates to a cache coherency technique in an agent using a pipelined bus.
As is known, many modern computing system employ a multi-agent architecture. A typical system is shown in FIG. 1. There, a plurality of agents 10-50 communicate over an external bus 60 according to a predetermined bus protocol. xe2x80x9cAgentsxe2x80x9d may include general purpose processors, chipsets for memory and/or input output devices or other integrated circuits that process data requests. The bus 60 may be a xe2x80x9cpipelinedxe2x80x9d bus in which several transactions may be in progress at once. Each transaction progresses through a plurality of stages but no two transactions are in the same stage at the same time. The transactions complete in order. With some exceptions, transactions generally do not xe2x80x9cpassxe2x80x9d one another as they progress on the external bus 60.
In a multiple-agent system, two or more agents may have need for data at the same memory location at the same time. The agents 10-50 operate according to cache coherency rules to ensure that each agent 10 uses the most current copy of the data available to the system. According to many cache coherency systems, each time an agent 10 stores a copy of data, it assigns to the copy a state indicating the agent""s rights to read and/or modify the data.
For example, the Pentium(copyright) Pro processor, commercially available from Intel Corporation, operates according to the xe2x80x9cMESIxe2x80x9d cache coherency scheme. Each copy of data stored in an agent 10 is assigned one of four states including:
Invalidxe2x80x94Although an agent 10 may have cached a copy of the data, the copy is unavailable to the agent. The agent 10 may neither read nor modify an invalid copy of data.
Sharedxe2x80x94The agent 10 stores a copy of data that is valid and possesses the same value as is stored in external memory. An agent 10 may only read data in shared state. Copies of the data may be stored with other agents also in shared state. An agent 10 may not modify data in shared state without first performing an external bus transaction to gain exclusive ownership of the data.
Exclusivexe2x80x94The agent 10 stores a copy of data that is valid and may possess the same value as is stored in external memory. When an agent 10 caches data in exclusive state, it may read and modify the data without an external cache coherency check.
Modifiedxe2x80x94The agent 10 stores a copy of data that is valid and xe2x80x9cdirty.xe2x80x9d A copy cached by the agent 10 is more current than the copy stored in external memory. When an agent 10 stores data in modified state, no other agents possess a valid copy of the data.
Agents 10-50 exchange cache coherency messages, called xe2x80x9csnoop responses,xe2x80x9d during external bus transactions. The snoop responses identify whether other agents possess copies of requested data and, if so, the states in which the other copies are held. For example, when an agent 10 requests data held in modified state by another agent 20, the other agent 20 may provide the data to the requesting agent in an implicit writeback. Ordinarily, data is provided to requesting agents 10 by the external memory 50. The modified data is the most current copy of data available to the system and should be transferred to the requesting agent 10 in response to a data request.
When external bus transactions cause an agent to change the state assigned to a copy of data, state changes occur after snoop responses are globally observed.
As an example, consider a xe2x80x9cread for ownershipxe2x80x9d request issued by an agent 10. Initially, an agent 10 may store the requested data in an invalid state. The agent 10 has a need for the data and issues a bus transaction requesting it. The agent 10 receives snoop responses from other agents 20-40. When the snoop responses are received, the transaction is globally observed. The agent 10 marks the requested data as held in exclusive state. The agent 10 may mark the data even though it has not yet received the requested data. For example, in known processors, data is transferred in a data phase of a transaction following a snoop phase. Before the data is received, an entry of an internal cache (not shown) is reserved for the data. A state field in the external transaction queue is marked as exclusive when the transaction is globally observed and before the requested data is 10 received, but the state field in the reserved cache entry is not marked exclusive until the data is filled into the cache.
Certain boundary conditions arise when state transitions are triggered by the receipt of snoop responses. An example is shown in the following table using the Pentium(copyright) Pro bus protocol:
In the boundary condition, without some sort of preventative measure, two different agents 10 and 20 in the system could mark a copy of the same data in exclusive state. To do so would violate cache coherency. Assume that two agents 10 and 20 post read requests to a single piece of data. The first agent 10 posts the request as explained above. When the first transaction concludes its request phase, the second agent 20 posts a second transaction for the same data.
Assume further that the snoop phase of the first transaction is stalled by a snoop stall. A snoop stall signal occurs when an agent (say, agent 30) requires additional time to generate snoop results. Although the first agent 10 may reserve a cache entry for the requested data, the agent 10 does not mark the requested data as exclusive until snoop results for its transaction are received. When snoop results eventually are received for the first transaction (in clock 8), the first agent 10 will mark the data as held in exclusive state. However, the first agent 10 observes the second transaction in clock 3. If it performs internal snoop inquiries for the second transaction before the first transaction is globally observed, its snoop response would indicate that it does not possess a valid copy of the data. The second agent 20 also could mark the data as exclusive. Having two agents 10, 20 each store data in exclusive state violates the MESI cache coherency rules because each agent 10, 20 could modify its copy of the data without notifying the other via a bus transaction.
The coherency violation can arise if an agent 10 begins internal snoop inquiries before its previous transaction to the data is globally observed. Thus, the error can be avoided if the snoop inquiries related to the second transaction are blocked until a prior conflicting transaction related to the same data is globally observed.
The Pentium(copyright) Pro processor includes a snoop queue to manage cache coherency and generate snoop responses. The snoop queue buffers all transactions posted on the external bus. For new transactions, the snoop queue compares the address of the new transaction to addresses of transactions that it previously stored to determine whether the addresses match. If so, and if the previous transaction were not globally observed, the snoop queue blocks a snoop probe for the new transaction. The block remains until snoop results for the prior pending transaction are received.
The Pentium(copyright) Pro processor""s snoop queue is large. The snoop queue possesses a queue entry for as many transactions as can be pending simultaneously on the external bus. It consumes a large area when the Pentium(copyright) Pro processor is manufactured as an integrated circuit. In future processors, it will be desirable to increase the pipeline depth of the external bus to increase the number of transactions that may proceed simultaneously thereon. However, increasing the depth of the external bus becomes expensive if it also requires increasing the depth of the snoop queue.
The Pentium(copyright) Pro processor""s snoop queue fills quickly during operation. The snoop queue buffers not only requests from other agents but also requests posted by the agent to which the snoop queue belongs. Because the Pentium(copyright) Pro includes an external transaction queue that monitors transactions issued by the processor, the snoop queue""s design is considered sub-optimal.
Accordingly, the inventors perceived a need in the art for a snoop queue in an agent that possesses a depth that is independent of the pipeline depth of the agent""s external bus. There is a need in the art for such a snoop queue, however, that maintains cache coherency and insures that, when two bus transactions related to the same address are pending on the external bus at the same time, snoop inquiries related to the second transaction will not be generated until the first transaction has been globally observed.
Embodiments of the present invention provide a method of processing a bus transaction in which an address is retrieved from the bus transaction and referred to a queue of pending transactions. A match indicator signal is returned from the queue. If the match indicator signal indicates a match, a snoop probe for the bus transaction is blocked.