1. Technical Field
The present invention relates in general to data transfers within data processing systems and in particular to ordered data stores on a pipelined bus within a data processing system. Still more particularly, the present invention relates to optimization of ordered stores on a pipelined bus within a data processing system to improve system performance.
2. Description of the Related Art
Integrated circuits which transfer data within data processing systems, particularly processors, are required to adhere to certain requirements for such data transfers. One possible requirement is ordering, where data transfer transactions must be completed on a system bus in the order that an execution unit generated the transactions. An ordering requirement between multiple data transfer transactions may occur, for example, in the context of multiple stores to the same address.
Other reasons exist for ordering requirements besides matching addresses for multiple stores. Typical processor architectures associate attributes with certain address ranges, such as the WIMG bits associated with page table entries within the PowerPC.TM. architecture. These attributes may specify the ordering of transactions to addresses within the associated address range. Moreover, depending on the type of memory model implemented, a processor architecture may support instructions which dictate specific transaction ordering, such as the EIEIO, SYNC, and TLBSYNC instructions within the PowerPC.TM. family of processors. These and other processor features may impose ordering requirements on data transfer transactions within a data processing systems.
Contemporary high performance processors typically utilize a high frequency, pipelined bus interface. The pipelined nature allows multiple transactions to be active on the bus simultaneously. In general, a bus transaction may be broken down into the following segments:
arbitration for the bus; PA1 presentation of the address and transaction type on the bus (to be decoded for slave selection and snooped for memory coherency); PA1 response by slaves and snooping masters to the transaction address, which may be an acknowledge (Ack) that the transfer may proceed as requested or a retry (Retry) which aborts the transaction and causes it to be initiated again later starting with arbitration; and PA1 transfer of data, which may occur before or after the address response, or not at all for address-only transactions.
A data transfer transaction is complete after the later of the last datum being presented and accepted or receiving a satisfactory address response.
To enforce strict ordering between data transfers, current processors completely serialize the transactions. Any pending transfer subject to an ordering requirement with a previous transfer is delayed from the beginning (bus arbitration) if the previous transfer is not complete or at least guaranteed to complete in the present bus tenure (i.e., the transfer has past the point where the bus protocol permits retry of the transfer).
FIGS. 4A and 4B are bus timing diagrams showing the general serialization of two ordered transfers. FIG. 4A is a timing diagram for non-retried transfers, while FIG. 4B is a timing diagram for the same operation when the first transfer receives an address response indicating a retry. In both figures, the address response is valid three clock cycles after the address is valid on the bus. In both cases, the bus request for data transfer B is not initiated until an acknowledge response is received for data transfer A. This requires a latency of at least six clock cycles for the best case or twelve clock cycles if a retry is asserted before data transfer B may be initiated. As shown in the figures, the total latency--from start of the first transaction to completion of the second--is twelve clock cycles with no retry response and eighteen clock cycles if a retry response of the first transaction is asserted.
In the example depicted, an address response three clock cycles after the address is valid is the only mechanism for retrying a data transfer. Some bus protocols may support multiple windows for responses--one for slave responses and one for snoop responses--with a much higher latency between the address and the final response. The higher the latency, the longer the delay before starting a succeeding, ordered transfer and the lower the overall performance for serialized data transfers.
It would be desirable, therefore, to provide a mechanism for reducing the latency of ordered transactions on a pipelined bus within a data processing system. It would further be advantageous for the mechanism to take full advantage of the pipelined nature of the bus while preserving the ordering requirement.