Certain processors (such as the PowerPC processor) bus serialize blocking instructions such as EIEIO (enforce in-order execution of I/O) which itself serializes storage accesses at their outgoing queues. Typically when an EIEIO instruction is executed, all storage access operations posted prior to the execution of the EIEIO instruction are marked for performance on the bus before any storage accesses that may be posted subsequent to the execution of the EIEIO instruction. Although the processor will not necessarily perform these transactions on the bus immediately, the programmer is assured that they will be performed on the bus before any subsequently posted storage accesses. In other words, the EIEIO instruction forces all EIEIO ordered storage accesses to finish on the bus before the EIEIO instruction releases to the bus. EIEIO completion on the bus allows EIEIO ordered storage accesses behind the EIEIO instruction access to the bus. In general, this can be applied to any instruction which orders some but not all subsequent instructions.
As an example of the benefit of such an instruction, assume that the programmer must write two parameter words, read a status register and then one command word to a fixed-disk controller and that the controller's ports are implemented as memory/mapped I/O ports. If the programmer executes the three stores and one load in order, the processor will post the writes but not perform them immediately. In addition, when it does acquire the external bus and performs the memory write or read transactions, it may not perform them in the same order as that specified by the programmer. This might result in improper operation of the disk controller (because it might receive the command word before the parameters and proceed to execute the command using old parameters).
To ensure that the first two stores (to write the parameter words to the disk controller) are performed prior to the store of the command word, the programmer should follow the first two stores with an EIEIO instruction. This would mark these two stores for performance on the bus prior to any subsequently posted writes. The third store (to the command register) would be executed after the EIEIO instruction and posted in the write queue. When the processor's system interface performs the three memory write transactions, the first two stores will be performed before the third one.
The problem with such typical EIEIO instructions is that they execute serially above the bus interface, as illustrated in FIG. 2. The EIEIO instruction blocks all subsequent instructions from executing until the EIEIO completes its bus activity. As a result, cache hit loads (e.g., LD3) not ordered by the EIEIO instruction wait unnecessarily behind the serially executed EIEIO.
FIG. 3 provides a simple illustration of that portion of a microprocessor pertaining to storage accesses. Instructions arrive at the execution unit(s) 301, which may require storage accesses through the load/store unit 28, which will contain a load queue 302 and a store queue 303. The load and store instructions are queued for transfer to the bus interface unit 12 coupled to the bus 11, which provides access to the main memory system 39 (see FIG. 1).
As discussed above, prior art EIEIO-type instructions block all subsequent instructions from executing at the execution stage. When the EIEIO instruction is sent down out of execution, then no other storage access type instructions, including further EIEIO instructions, can be sent to data cache 16. Consequently, storage access instructions, which could be satisfied by access to data cache 16 and do not require the considerably longer access to main memory 39, are also blocked by the EIEIO instruction at the execution stage. As an example, in FIG. 2, Group 1 illustrates load instructions LD1 and LD2, followed by an EIEIO instruction EIEIO1 serially programmed in three consecutive clock cycles. The typical EIEIO instruction then provides a block to subsequent storage access instructions at the execute stage. Store instructions ST1 and ST2 and load instructions LD3 and LD4, along with the second EIEIO instruction, EIEIO2, are not permitted to execute until some undetermined number of clock cycles m when the instructions LD1 and LD2 have been fully executed and completed over the bus 11.
In this example, load instruction LD3 is a cacheable load that can execute and hit on data cache 16. However, with the prior art EIEIO instruction configuration, the execution of instruction LD3 will also have to wait the indeterminate number of clock cycles m.
As a result, there is a need in the art for an improvement over the above scenario.