1. Field of the Invention
The invention generally relates to computer microprocessors and in particular to memory arbitration schemes for use in microprocessors having multiple cycle memory coprocessor units.
2. Description of Related Art
FIG. 1 illustrates a portion of a microprocessor having a set of memory coprocessors utilizing a single data bus for transmitting data, instructions and the like retrieved from a memory (not shown). Specifically, FIG. 1 illustrates an address generation unit (AGU) 10, a bus control logic (BCL) unit 12, and a stack RAM (SR) unit 14 all connected to a LdData bus 16. Access to data bus 16 is arbitrated by a memory arbitration unit 18. Bus 16 is also connected to a register file 20 which receives and stores data, instructions, and the like transmitted by AGU 10, BCL 12 and SR 14. AGU 10, BCL 12 and SR 14 together comprise a set of memory coprocessors which operate, often in parallel, to calculate effective addresses and retrieve data from the main memory based on &lt;load&gt; and &lt;lda&gt; op-codes provided by an instruction decoder (ID) 24 and to transmit the retrieved data to register file 20. Specifically, AGU 10 calculates effective addresses based on &lt;lda&gt; commands. BCL 12 retrieves data from the main memory based on &lt;load&gt; commands. SR 14 retrieves data from internal memory stacks within the SR in responds to &lt;load&gt; commands.
The coprocessor architecture illustrated in FIG. 1 is exemplary of a number of microprocessor memory coprocessor architectures but, in particular, is illustrative of the Intel 960CA microprocessor provided by Intel Corporation of Santa Clara, Calif. the assignee of rights to the invention of the present application.
Although not critical for understanding the memory arbitration scheme of the memory coprocessor architecture of FIG. 1, a brief summary of the functions of AGU 10, BCL 12, SR 14 and RF 20 is provided. AGU 10 is a single cycle unit which receives &lt;lda&gt; memory access commands from ID 24, determines an effective address for the &lt;lda&gt; command and provides the effective address to BCL 12 or SR 14 which perform a memory access using the effective address. The effective address is provided over data bus 16. It should be noted that bus 16 transmits information of various types including instructions, RAM data, etc. The term "data" used herein is not limited to merely one type of internal computer information.
BCL 12 is a multiple cycle memory coprocessor which receives &lt;load&gt; commands from ID 24 and executes same by accessing the main memory (not shown). SR 14 responds to &lt;load&gt; commands from ID 24 to retrieve data from internal stack memory and transmit the data to RF 20. The effective address of the data, which determines whether the BCL or SR 14 is to retrieve the data, is calculated by the AGU. RF 20 provides registers for storing information retrieved by either AGU 10, BCL 12 or SR 14 for subsequent access by a core processing unit (not shown.) RF 20 is not the only receiver on bus 16, but depending upon the type of data placed on the bus and the entity driving the bus, other units may also act as a bus receiver. Although FIG. 1 illustrates only a single bus interconnecting the various memory coprocessors, other pathways between the memory coprocessors may be also employed.
Memory arbitration unit 18 controls access to bus 16 to prevent conflicts if more than one of SR 14, AGU 10 and BCL 14 attempts to asserts data onto bus 16 simultaneously. SR 14 and AGU 10 are single cycle memory coprocessors which must return data immediately. Because SR 14 responds only to load commands and AGU 10 responds only to mutually exclusive &lt;lda&gt; commands, arbitration between the AGU and the SR is not required. However, BCL 12 is a multiple cycle memory coprocessor, with arbitrary return latency such that the number of clock cycles required before BCL 12 is ready to assert data onto bus 16 is unknown. BCL 12 can attempt to return data simultaneously with either of SR 14 and AGU 10. Accordingly, conflicts can arise in accessing data bus 16 if BCL 12 seeks to assert data onto bus 16 simultaneously with either SR 14 and AGU 10.
To avoid such conflicts, memory arbitration unit 18 assigns varying priorities to SR 14, BCL 12 and AGU 10. SR 14 and AGU 10 are assigned equal priorities, greater than that of BCL 12. As noted, SR 14 and AGU 10 cannot return data simultaneously and, hence, the equal priority is not a problem. BCL 12 is assigned a bus request priority less than that of AGU 10 and SR 14, and, if conflicts arise, SR 14 and AGU 10 are ordinarily granted access to bus 16 immediately, with BCL access deferred for at least one clock cycle.
BCL 12 is provided with a four-entry input queue 26 for storing data if the BCL is not granted immediate bus access because of a simultaneous bus request by SR 14 or AGU 10. After the SR or AGU request is accommodated, BCL 12 is then granted bus access and BCL 12 retrieves the data from its four-entry queue 26 on a first-in first-out basis and asserts the data onto the bus. If several SR or AGU requests follow one after the other, BCL access may be further deferred. Should queue 26 of BCL 12 become full, BCL 12 issues a signal causing SR 14 and AGU 10 to defer access to bus 16 to allow the BCL to output one element of the queue. The signal issued by BCL 12 is a memory scoreboard signal. Any data otherwise asserted onto queue 26 by AGU 10 or SR 14 is disqualified.
Thus, circumstances arise when AGU 10 and SR 14 cannot return data immediately. If the memory scoreboard is pulled by BCL 12, AGU 10 or SR 14 must reissue the bus request which was deferred by the scoreboard mechanism.
However, an additional problem occurs even when the queue of the BCL is not full. Upon receipt of a &lt;load&gt; command from ID 24, SR 14 cannot immediately determine whether it (the SR) is capable of responding to the &lt;load&gt; command. Such a determination must wait until AGU 10 has calculated the effective address of the memory load via the &lt;lda&gt; command. Only after the effective address for the &lt;load&gt; is calculated can the SR determine whether it, rather the BCL, is the coprocessor entity capable of retrieving the data corresponding to the effective address. If SR 14 issues a bus request upon the assumption that it, and not the BCL, will be able to drive the bus, and that assumption proves false, a clock cycle is wasted before the BCL can drive the bus.
To resolve the ambiguity that SR 14 may or may not require bus access, an additional arbitration priority level is required and SR 14 is provided with a single entry queue 28. When ID 24 issues a &lt;load&gt; command, SR 14 "guesses" whether the SR will require bus access on a following clock cycle, and the SR issues a "guess" signal to arbitration unit 18. The "guess" signal represents a lowest level of bus priority, below even that of BCL 12. Essentially, the "guess" signal notifies memory arbitration unit 18 that the SR may require bus access on the following clock cycle. Once the effective address has been calculated by AGU 10 and the SR knows whether it (the SR) requires bus access, the SR either asserts a bus "request" or does nothing. Should memory arbitration unit 18 receive a "guess", but no other conflicting bus requests, bus access is granted immediately. However, should the memory arbitration unit receive an SR "guess" signal simultaneously with a BCL request, then the BCL request is granted. If the SR becomes ready to transmit data in the following clock cycle, i.e. if the "guess" was correct, then the SR must store the retrieved data in single entry queue 28 and issue a bus request for execution in a subsequent clock cycle.
Thus, the SR "guess" mechanism is required because of an ambiguity in whether the SR will require immediate bus access. The single entry queue is required because the BCL may be granted bus access while the SR is awaiting a resolution of the ambiguity. Of course, the SR "guess" could simply be granted a higher priority than the BCL request. However, SR memory access is fairly rare, and numerous dock cycles would be wasted on the bus by always granting access to the SR, even before it has been determined whether the SR contains the requested data.
A timing diagram illustrating a circumstance when memory arbitration unit 18 simultaneously receives an SR "guess" and a BCL request is provided in FIG. 2. Two pipe stages, each having-two phases, are illustrated in FIG. 2, with the stages generally identified by index q.sub.xy, with x representing the pipe stage and y representing the phase. In general, three pipe stages are provided (with only the first two illustrated in FIG. 2.) The three stages are stage 0, stage 1 and stage 2. A new instruction address is issued and instruction words are read out at stage 0--the "instruction fetch" stage. Instructions are decoded and operands are read out of RF 20 at stage 1--the "instruction issue" stage. Finally, in stage 2, the instructions are executed and results are returned to register file 20.
In the example of FIG. 2, at q.sub.11 an SR "guess" is asserted by SR 14 in response to a load &lt;ld1&gt; issued by ID 24. Simultaneously, a BCL request is asserted. As the BCL request has a higher priority than the SR "guess", a BCL grant is issued by memory arbitration unit 18 at q.sub.12. If SR is, in fact, ready to assert data onto the bus at q.sub.12, SR 14 must place the data into single entry queue 28 during q.sub.12 pending a subsequent SR bus request. At q.sub.21, BCL responds to the BCL bus grant issued in q.sub.12 by driving bus 32, as illustrated in FIG. 2 by a LdValid signal and a LdData bus signal.
If the "guess" was correct, at q.sub.21, SR 14 upgrades its bus access to a "request", rather than a "guess". The SR request is granted at q.sub.21. In a subsequent clock cycle (not shown) SR drives bus 16. If a second BCL request is asserted at q.sub.21, the SR request, having a higher priority, is granted, at q.sub.21. BCL 12 must store the data in four-entry queue 14 and re-request bus access at a subsequent clock cycle (not shown).
Thus, a SR 14 is provided with a single entry queue and an additional level of priority, namely the SR "guess", is required, resulting in added circuitry and complexity. Although this architecture and arbitration scheme adequately prevents a deadlock condition, the added circuitry and complexity is undesirable. Further, in circumstances where a tight loop and &lt;load&gt; and &lt;lda&gt; signals are issued by ID 24, the queue of BCL 12 is frequently filled necessitating frequent score boarding, resulting in frequent disqualification of data previously retrieved by either the SR or AGU.
Accordingly, an improved architecture and arbitration scheme is desired for resolving the aforementioned bus contention problems which does not require a "guess" mechanism and which does not require that the SR be provided with a single entry queue.