1. Field of the Invention
This invention generally relates to an architecture, system and method for maintaining cache coherency and processor consistency within a multi-processor computer system. More particularly, this invention relates to maximizing throughput within multiple processor buses by minimizing snoop stall cycles or out-of-order transactions on at least one of the processor buses.
2. Description of the Related Art
Multi-processor systems are generally well known, whereby a set of processors are interconnected across a local or distributed network. The local network can be confined to a single computer and the distributed network can be one involving, for example, a LAN or WAN. Each processor may either be interconnected with other processors on a single processor bus or connected on its own respective processor bus separate from other processor buses. In the former instance, the processors are said to have been grouped as xe2x80x9cclusters.xe2x80x9d For example, a Pentium(copyright) Pro processor bus can support up to four Pentium(copyright) Pro processors. Each cluster can thereby be connected to a processor bus and routed to a system memory bus via a system memory controller or bus bridge.
Most modern day processor buses use a pipeline architecture. More specifically, dissimilar stages (or phases) of each transaction can occur with other phases of another transaction so as to service multiple phases of multiple transactions on a singular processor bus. In the Pentium(copyright) Pro example, each transaction can employ several phases that can include some or all of the following phases: arbitration phase, request phase, error phase, snoop phase, response phase, and data phase.
During an arbitration phase, the processor bus requesting agent seeks mastership of its respective bus, as granted by an arbiter. A processor is deemed a bus agent and, if multiple processors are arranged in a cluster, arbitration among those processors is granted using, for example, a prioritized or symmetric bus arbitration mechanism. Once a requesting agent is granted ownership or mastership of the bus, a requesting agent will drive an address within a request phase. Provided no errors are discovered for that transaction, as recorded in the error phase, a snoop phase is initiated. In the snoop phase, cache coherency is enforced. Mainly, all bus agents which receive a snoop cycle route a hit or a hit-to-modified signal to the bus agent which requested the snoop cycle. The resulting response of the transaction is then driven during the response phase by the responding bus agent. Thereafter, a data phase will occur for that transaction, provided it had not been deferred.
The snoop results are driven by the processor bus agents during the snoop phase. Those results indicate whether the corresponding snoop request address references a valid or dirty cache line in the internal cache of a bus agent coupled to the processor bus. The dirty cache line is often referred to as a xe2x80x9cmodifiedxe2x80x9d cache line. The values of HIT# and HITM# are used to indicate whether the line is valid or invalid in the addressed agent being snooped, whether the line is dirty (modified) in the caching agent, or whether the snoop phase needs to be extended. The bus agent being snooped (i.e., xe2x80x9ccachingxe2x80x9d agent) will assert HIT# and de-assert HITM# in the snoop phase if the agent plans to retain the cache line in its cache after the snoop is completed. The caching agent will assert HITM# if its cache line is in a modified state (i.e., indicative of the caching agent containing the most recent cache line date for that address). After asserting HITM#, the caching agent will assume responsibility for writing back the modified cacheline, often referred to as xe2x80x9cimplicit write back.xe2x80x9d If the caching agent asserts HITM# and HIT# together in the snoop phase, then a snoop stall will occur so as to stretch the completion of the snoop phase for as long as needed to ensure the caching agent will eventually be ready to indicate snoop status.
If a DEFER# signal is forwarded during the snoop phase from the caching agent, hat agent will effectuate removal of that particular transaction from the in-order queue, often referred to as a xe2x80x9cIOQxe2x80x9d. During the response phase, responses to a DEFER# signal forwarded during the snoop phase will be indicated by one of three valid responses: deferred response, retry response, or hard error response. If a DEFER# is initiated during a snoop cycle and a response indicates either a deferred response or a retry response, it will be noted that the deferred transaction will be requested out-of-order from its original request. According to one example, the deferred request may occur subsequent to a snooping request cycle to indicate an out-of-order sequence or, alternatively, split transaction.
Most modem day processor buses rely on procedures or operations that appear somewhat atomic, in that processor buses generally retire data in the order that transactions begin. Moreover, data transfers of one transaction are often dependent upon data transfers of another transaction. For example, completion of a request from memory may require implicit write back of data from a caching agent if the request is to a modified line in that caching agent.
FIG. 1 illustrates the atomic nature of transactions and the dependency of those transactions within a pair of processor buses of a multi-processor system. For example, a first processor on a first processor bus xe2x80x9c1xe2x80x9d requests a transaction A on the first bus. Following transaction A, a snoop request As will be forwarded to the second processor on the second processor bus and specifically to the cache within the second processor. Meanwhile, the second processor dispatches transaction B on the second processor bus, eventually yielding a snoop transaction Bs on the first processor bus and specifically to the cache within the first processor. If both snoop requests yield a hit-to-modified signal (HITM#) being asserted, a live-lock condition may result whereby both buses are locked and unable to forward the data requested since that data is contingent upon receiving the modified cache line from the opposing bus"" caching agent. More specifically, relative to the first bus, the modified data for transaction B cannot be driven on the first bus until transaction A receives its data. On the second bus, the modified data for transaction A cannot be driven on the second bus until transaction B receives its data. The pipeline transfer of responses and data is thereby maintained in a locked condition, thus preventing further transfers of data across Bus 1 and Bus 2.
It may be that in order to prevent a live-lock condition, a DEFER# signal will need to be forwarded during the snoop phase. The DEFER# signal will be forwarded across the first and second buses as DEFER1# and DEFER2# as shown by reference numerals 10 and 12, respectively. Asserting the defer signals whenever a hit-to-modified HITMx# occurs on that bus (where X is either 1 or 2) will ensure that all transactions on the respective buses will be deferred. Even if a hit-to-modified signal is present on only one bus, transactions on both buses may be deferred. Even though a hit-to-modified signal occurring on both buses is relatively small, the technique of deferring transactions on both buses not only may be unnecessary, but also consumes substantial bus bandwidth since the deferred transaction must later be completed with a deferred reply.
Alternatively, the multi-processor system may utilize a central tag controller which links up all coherent transaction addresses in a tag filter to see if the remotely located processor bus agent may own the snooped address. If there is a hit to the tag filter, the snooping agent maintains ownership of its respective processor bus. This allows the local transaction to complete on the local processor bus. If the transaction on the remote bus hits a modified address noted in the tag filter, the remote processor will be required to defer its transaction. An unfortunate aspect of using tag filters, look-up tables, and various other temporary memory devices adjacent the snooping agent is the time required to access the tags and note whether or not a hit-to-modified result occurred. This implies that the HITM# signal be delayed as a result of stalling the snoop cycle until access of the tag filter has completed. As shown in FIG. 1, deferring the snoop cycle 14 and 16 one or more cycles before asserting HITM# will unfortunately delay the overall pipeline throughput on those buses.
Referring to FIG. 2, a two-transaction example is shown. Deferring the first transaction 8 may occur at the snoop phase by tagging transaction 8 and allowing the second transaction 9 to proceed as cycles 9e and 9f within respective response and data transfer phases. In this manner, priority is given to, e.g., a snoop initiated transaction on a particular bus over that of the normal request transaction which initially preceded the snoop request. The first transaction is therefore said to be deferred, whereby deferral removes transaction 8 from the IOQ by generating an appropriate response. If a tag filter is used, instead of deferring each transaction, snoop stalling may be required during the snoop phase. A snoop stall would extend the pipeline by, for example, stalling transaction 8d from its normal pipeline position at clock 10 to, for example, clock 12. This would force the second transaction at the snoop phase (cycle 9d) to also be deferred, as well as any other snoop cycles which occur later in time.
The penalty of deferring a transaction or snoop stalling cycles in the snoop phase are but two examples used to avoid live-lock conditions. Unfortunately, however, use of defer cycles and snoop stall cycles should be avoided so as to enhance the overall throughput and bandwidth of two or more processor buses within a multi-processor system. An architecture, system and method must be employed which can overcome these throughput issues while preventing the occurrence of a live-lock situation.
The problems outlined above are in large part solved by an improved architecture, system and method for reducing the need to defer or retry bus transactions in a system involving multi-processor buses. The improvement also minimizes the need to snoop stall transactions within those buses. The overall benefit is an improvement in bus throughput and overall system speed enhancement.
A guaranteed access controller is provided between the multi-processor buses. The controller includes an arbiter which maintains mastership of a first bus while guaranteeing initiation of a snoop cycle upon a second bus, both buses defined as those which couple to separate processors or processor clusters within a multi-processor computer system. The access is maintained so that only one transaction within the pair of buses is deferred. The other transaction is assured to continue in-order, thereby preventing a split transaction on that bus. For example, if the first bus employs transactions which occur in order (i.e., not taken out of the IOQ) and are not split, throughput on that bus is enhanced. Of benefit is the avoidance of deferring transactions on both buses whenever a hit-to-modified signal occurs. Even under conditions of a hit-to-modified signal, a transaction within one bus is assured of completing in-order (i.e., without implementing a deferral of that transaction).
According to one embodiment, the control logic and/or arbiter is maintained within a bus interface unit. The bus interface unit is linked between two or more processor buses. Each bus may have a single processor or a cluster of processors linked thereto. The arbiter may be coupled to allow completion of a first transaction within the first bus and also to initiate a snoop request cycle to a modified cache line within a first bus agent coupled to the first bus before granting mastership to the second bus. The snoop request cycle may originate from the bus interface unit or from a second bus agent coupled to the second bus. Preferably, the processor bus agent (i.e., the first and second bus agents in the example provided) each comprise memory. The memory can be cache memory configured to store cachelines of data. The bus interface unit and, more particularly, the processor or cluster control logic of the bus interface unit, may be granted mastership of the second bus after completion of the first transaction and initiation of the snoop request cycle in order for the bus interface unit to initiate another snoop request cycle to a modified cacheline within the second bus agent.
According to another embodiment, the bus interface unit may comprise a first bus controller and a second bus controller. The first bus controller is coupled to receive a first request across a first bus from a first bus agent, whereas the second bus controller may be coupled to receive a second request across a second bus from a second bus agent. The first request is preferably to a modified cache line within the second bus agent, while the second request is preferably to a modified cache line within the first bus agent. The arbiter may be coupled to the first bus controller and the second bus controller to allow completion of a transaction originating from the first request, and to allow completion of a snoop request cycle originating from the second bus agent before granting mastership to the second bus.
The arbiter may include a state machine which gives priority of mastership to the first bus over that of the second bus. The priority can be either fixed or can change. If the priority changes, the algorithm used to change that priority can include, for example, a round-robin scheme.
According to yet another embodiment, a computer may be provided. The computer includes an arbiter coupled to maintain mastership of the first bus such that the first bus can complete a response to the first request and can complete a snoop request issued from a second bus agent to the cache memory of the first bus agent for allowing the second request to be issued across the second bus. A peripheral device may be included with the computer. The peripheral device is adapted for communication with the first and second buses, and is coupled upon a printed circuit board separate from another printed circuit board upon which the arbiter and/or processor is coupled. The arbiter may issue the snoop request across the first bus to the cache memory of the first bus agent before a snoop request is issued across the second bus to the cache memory of the second bus agent. The arbiter may issue a defer signal to the second bus agent to defer the second request across the second bus such that the second request is serviced after a snoop request is issued across the second bus. The peripheral device can include any device connected to a peripheral bus separate and distinct from the processor bus. Examples of such peripheral devices include a hard drive controller, a keyboard controller, a display controller, or a printer controller. The hard drive controller, keyboard controller, display controller, and printer controller operably links the peripheral bus and signals therein to respective mass storage devices, a keyboard, a display, and a printer.
According to another embodiment, a computer is provided having a peripheral controller means arranged upon a first printed circuit board. A first and second bus agent means are coupled to a second printed circuit board separate from the first printed circuit board and further coupled to respective first and second buses. Means are provided for granting mastership of the first bus to allow a transaction from the first bus agent and the snooping cycle to occur across the first bus before granting mastership to the second bus.
According to yet another embodiment, a method is provided for orchestrating transactions across a bus within a multi-processor system. The method includes dispatching a request cycle from a first bus agent across the first bus to a modified address within a second bus agent. Thereafter, a request cycle from the second bus agent across a second bus is deferred. The deferred request cycle is preferably one which addresses a modified address within the first bus agent. A first snoop cycle dispatched across the first bus to the first bus agent is serviced before retrying the deferred request cycle.