1. Technical Field
The present invention relates in general to data processing systems with multiple processors and in particular to system bus arbitration between processors. Still more particularly, the present invention relates to bus arbitration involving a Previous Adjacent Address Match ("PAAM") detection function and early retrying of PAAM address collisions.
2. Description of the Related Art
Data processing systems are constantly being improved through improved efficiency and increased transaction throughput. Generally, areas of development for increased throughput are: the use of multiple processors within a data processing system; using Reduced Instruction Set Computer ("RISC") processors and incorporating high speed memory caches in the processors.
A RISC processor operates by using only a small set of basic commands for fundamental computer operations. The objective of a RISC processor design is to use only the most frequently used instructions of the overall instruction set. By reducing the number of instructions to an optimal few, the circuitry of the processor is simplified and the speed f execution of each instruction is increased.
Performance capabilities in data processing systems using RISC processors are enhanced further by installing additional RISC processors in the system. The RISC processors can be configured to multi-task or to parallelize transactions and run several applications simultaneously; dramatically improving the throughput and performance of the system.
RISC processors further improve throughput by employing multiple instruction execution units within the processor allowing out of order execution. Cache memories within the RISC processor, usually two, are generally very fast. When accessing memory for data or instructions, the processor usually queries the first level of cache memory (L1), which is very fast but has a small capacity. A second level of cache memory (L2) is the second choice of the RISC and is larger than L1, but is slower. L1 and L2 memory is usually located on or very near the processor. If the required instruction or data is not found in either L1 or L2, the main memory, which has a large capacity, is accessed via a system bus. Data is usually pre-fetched from the main memory and stored in the relatively fast L1 or L2 cache memory to save processing time. As long as the processor accesses data from cache memory, the overall operating speed of the processor is maintained at a higher level than would be possible if the processor was required to access the main memory for required information. Further, the processors would have to arbitrate for control of the bus (system bus), connecting the processors to the main memory, for each data access, thereby consuming more processing time.
Data needed by a particular processor in a multiple processor data processing system usually has a low probability of being in that particular processor's cache memory. So, there will be a high level of data traffic on the system bus between system memory and processor caches. All the processors being connected to the system memory by the system bus, produces many different operations, or transactions, on the bus at the same time.
In a multiple processor data processing system, several system bus arbitration and timing problems may arise. In data processing systems where the processors are using a pipelined system bus, one transaction (also referred to as "operation") may have the same address as that of a "previously immediate adjacent operation"--that is, the two operations are close enough in time when attempting to access the same address that they could affect each other. If the address of the previously immediate adjacent operation matches that of the current operation and the two transactions have collidable types, then the two operations will collide. In these situations, the current operation processor will cancel its own operation on the bus by asserting retry signals.
A PAAM window is measured, in cycles, from the point in time that a processor first snoops an address on the system bus, to the point the processor knows that the master of the snooped address has received a response. The window can vary depending on the response time of responding devices on the system bus. The longer the response time, the higher the probability that a processor will retry its own transaction due to a PAAM collision the more retries required by a processor, operating on the system bus, to pass the PAAM window so as to avoid a collision.
FIGS. 6A-C is a timing wave form diagram of a problem caused by a PAAM window. Terms indicated in FIGS. 6A-C are defined as follows: ABG.sub.-- A/ABG.sub.-- B--Address Bus Grant for processor A and Address Bus Grant for processor B (The signals illustrated are arbitrating signals used by the System Controller to inform a processor that it is granted mastery of the system bus. For example, processor B is granted bus mastery in clock cycle 5 (the clock cycles are indicated in the boxes above the columns.); EATS--Early Address Transfer Start informs the slave devices on the system bus to read the address on the system bus; Address--the targeted address of a particular processor; Transaction Type--the class of transaction such as a general load, store, or cache operation; AStatOut--Address Status Out signal, issued from a slave device on the bus, is early status information on the bus such as a busy or early retry signal; AStatin--Address Status In is a signal into the Master indicating early status information coming in, such as busy or early retry; RDA--shows a potentially interfering transaction driven by processor A. RDB--represents a read operation driven by processor B.
In these diagrams, two processors, processor A and processor B, are using a pipelined system bus. An operation initiated by processor B may collide with a previously immediate adjacent operation by processor A. Collide, in this instance, means that processor B snoops the target address of processor A to determine if the addresses are the same. If the two operations collide in a PAAM window, processor B will cancel its own operation by asserting retry signals in the appropriate window.
The timing wave form diagrams illustrated in FIGS. 6A-6C show three different sequences for the relationship in time between RDA and RDB. The timing wave form in Figure A, illustrates non-colliding transactions. In this sequence, processor A receives the response to RDA in cycle 7 and is able to feed it into the response to transaction B (AstatO at cycle 8.
The wave forms in FIGS. 6B and 6C, illustrate similar collision sequences. A discussion of FIG. 6C is illustrative of FIG. 6B. The timing wave form diagram in FIG. 6C illustrates a PAAM collision and is interpreted as follows: at cycle 4, processor A outputs address A. At cycle 6, processor B outputs address B. Processor B checks the signals (RDB) on the bus three cycles back, compares target addresses with processor A and determines that processor B's target address matches that of processor A. The response has not yet been returned to processor A, so the status of address A is not known by either processor. Processor B has to retry because the assumption is that processor A's snoop, or "read" in this case, will get a positive response and the address will not be available for Processor B. The address response, Astatin, in cycle 8 is the response by the slave, or bus device, for processor A. Astatin information is not available to processor A until cycle 9. The collision occurs since processor A should have transmitted the new response information in its response to processor B's RD at cycle 8. The collision is shown by the head of the curved arrow.
FIG. 5 depicts a high level flow diagram of the PAAM window problem as described above and in FIGS. 6A-C. The process begins in step 500, which depicts processor A issuing an operation to the system bus. Next the process proceeds to step 502, which illustrates processor B being snooped by the system bus for transaction A. The process then proceeds to step 504, which depicts processor B issuing its operation. The process next passes to step 506, which illustrates determining whether processor A's target address is the same as the target address for processor B. If the addresses are not the same, the process proceeds to step 508, which illustrates processor B completing its operation. If the targeted address of processor A and processor B are the same, the process instead proceeds to step 510, which depicts determining whether processor A has received a response from a device (slave) on the system bus in time to factor the response into the response to processor B. If there has been no response received, the process passes to step 512, which illustrates the PAAM window open and processor B retrying the operation. Processor B is still trying to gain access to the targeted address but at this point, confirmation of processor A having received the address has not been received. Therefore, processor B may be able to access the address and the process proceeds next to step 504 and repeats the cycle. If processor A receives a response in step 510, the process proceeds instead to step 514, which depicts the PAAM window closing and allowing processor A to gain access to the target address thus, freeing processor B to output its address.
In processors using a pipelined system bus, such as the PowerPc.TM. processors (620 and 630FP) available from IBM Corporation of Armonk, New York, the operation detecting PAAM, in this case processor B, waits for its astat window until it can re-arbitrate on the bus. The wait for the astat window--for processor B to self retry--incurs two disadvantages. The first disadvantage includes the number of cycles involved in waiting for the astat window to cancel the operation. Generally, the processors connected to the system bus are much faster than the system bus. As the ratio between the processor speed and bus speed grows, this delay becomes very significant. In the present example, the delay would be meaningful if processor B is stalled for five bus cycles or 10 processor cycles in a system with two processor cycles per bus cycle. Ten processor cycles is significant, especially when the processor is capable of two operations per cycle.
A second disadvantage caused by the wait for self retry involves a possible "livelock" condition, where multiple processors try to access the same address on the bus all at the same time. For instance, FIG. 4 illustrates multiple processors in arbitration for the system bus, causing a livelock condition. In this example, processor A receives Astatout (early status information from slave) early retry for some other reason than PAAM, such as a full snooper. Processor B receives PAAM (astatin not known by processor A or B yet) on processor A's address so processor B will (Astatout) early retry itself. Processor A comes back with its address but detects PAAM (astatin not received by processor C yet) because of processor C, and processor A will (Astatout) early retry itself.
In a livelock condition, all three processors; A, B and C are trying to gain access to the same address at the same time and they continue to retry in a ping pong fashion, first one, then the other. All are trying to access the address but because of the PAAM window and the need to retry, livelock occurs and though the processors are working, no useful work is done. The PowerPc.TM. 620 attempts to solve a livelock condition by delaying early retried operations to push past the PAAM window, either for a random period or for a fixed number of bus cycles before retrying. The extra delay clears up the livelock condition but adds to the five bus cycle latency already experienced by PAAM retried operations. So, even though the livelock condition is terminated, the solution adds delay and inefficiency.
It would be desirable, therefore, to eliminate early retrying of PAAM window collisions, reduce bus cycle latency and prevent or break up a possible livelock condition.