1. Technical Field
The present invention relates in general to processor bus interfaces within data processing systems and in particular to gathering logic within processor bus interfaces in data processing systems. Still more particularly, the present invention relates to implementing full data gathering in a processor bus interface without degrading performance.
2. Description of the Related Art
Integrated circuits which move data within a data processing system, particularly processors, typically move data of varying sizes and varying addresses. Processors also typically operate at a frequency which is a multiple higher than the bus over which data is transferred. System buses are typically designed to provide optimal performance when handling large blocks of data, while processors typically perform smaller block accesses during instruction execution and generally have very limited large block data movement capability. To maximize the efficiency of data movement and minimize the impact of processor transfers on the bus, it is advantageous to transfer the maximum amount of data possible during a bus transaction. Data coalescing or gathering provides a hardware mechanism to combine pending bus transactions to maximize the amount of data transferred in one bus tenure once the bus is available.
Gathering is typically performed in the processor's bus interface unit as a transaction is placed into a queue of pending bus transactions. An example of transfers being written into a queue which supports gathering is shown in FIG. 5. Transactions 502 input into queue 504 for transfer on the bus (not shown) are 1, 2, 3, or 4 byte unaligned transfers (data sizes not address aligned to their natural boundary). Queue 504 in the depicted example is four bytes wide. In the example shown, entry 0 of the queue already contains a four byte store which is active on the bus. This entry is not allowed to participate in gathering since the address and size cannot change during a transfer. Any transfer added to the queue behind the four byte store in entry 0 will be available for gathering.
In the example shown, a one byte store transaction is to be added to entry 1 of queue 504 during transaction 1. Next, another one byte store to the adjacent address is to be added to queue 504 in transaction 2. The gathering logic utilizes the address and size of the previous store and the address and size of the incoming store to detect that the two entries are gatherable. Instead of transaction 2 being written into entry 2, therefore, the incoming store is written into entry 1 and is combined with the previous transaction into one bus transaction. The process is repeated with another one byte store to the adjacent address to be added to queue 504 in transaction 3 and a fourth one byte store to the adjacent address to be added to queue 504 in transaction 4, both of which are combined with the previous transactions for a total of four consecutive byte stores that gather into one four byte bus transaction.
A typical transaction queue without gathering logic within a bus interface unit is shown in FIG. 6A. Transaction queue 602 is controlled by write enable logic 604 which utilizes the sizes and addresses of available entries and selects the entry to be loaded with an incoming transaction, forming the appropriate write enable signals when there is an incoming transaction to be placed into a queue entry. Entry control logic 606 controls other queue manipulations, such as entry movement for a first-in, first-out (FIFO) implementation.
FIG. 6B shows the same transaction queue as FIG. 6A, but with the addition of standard gathering detection logic 608 to the other components. The gathering logic compares existing transactions within queue entries to incoming transactions to determine whether they are gatherable. Write enable logic 604 is delayed until gathering logic 608 completes this comparison, since whether the incoming transaction is gathered with an existing queue entry directly affects the write enable generation because the gathering logic affects which entry into which the transaction will be written. Thus, the standard approach of gathering as the queue entries are loaded, described in connection with FIG. 5, may add many logic levels between the incoming transactions and write enable logic 604. The number of levels added depends on the depth of the gathering logic, which is a function of the number of gatherable combinations possible.
Generally, incoming transactions must be written into queue 602 during the same clock cycle as they are sent to queue 602. This requires selection of an appropriate queue entry for an incoming transaction with ample time remaining in the clock cycle to perform the write of the incoming transaction into the selected queue entry.
TABLE I ______________________________________ Last Input/Resident Incoming Transaction Transaction Size Address Size Address ______________________________________ 1 byte 0 .times. 00 1 byte 0 .times. 01 1 byte 0 .times. 00 2 bytes 0 .times. 01 1 byte 0 .times. 00 3 bytes 0 .times. 01 1 byte 0 .times. 01 1 byte 0 .times. 02 1 byte 0 .times. 01 2 bytes 0 .times. 02 1 byte 0 .times. 02 1 byte 0 .times. 03 2 bytes 0 .times. 00 1 byte 0 .times. 02 2 bytes 0 .times. 00 2 bytes 0 .times. 02 2 bytes 0 .times. 01 1 byte 0 .times. 03 3 bytes 0 .times. 00 1 byte 0 .times. 03 1 byte 0 .times. 03 1 byte 0 .times. 02 1 byte 0 .times. 03 2 bytes 0 .times. 01 1 byte 0 .times. 03 3 bytes 0 .times. 00 1 byte 0 .times. 02 1 byte 0 .times. 01 1 byte 0 .times. 02 2 bytes 0 .times. 00 1 byte 0 .times. 01 1 byte 0 .times. 00 2 bytes 0 .times. 02 1 byte 0 .times. 01 2 bytes 0 .times. 02 2 bytes 0 .times. 00 2 bytes 0 .times. 01 1 byte 0 .times. 00 3 bytes 0 .times. 01 1 byte 0 .times. 00 ______________________________________
As the number of gatherable combinations increases, more clock cycle time is required by gathering logic 608 to perform the necessary comparisons. The example depicted in FIG. 5, for instance, requires comparison logic for 20 possible gatherable transactions. Queue 504 is connected to a simple, four byte wide bus which may service unaligned transactions of 1, 2, 3, or 4 bytes. The gatherable transaction pairs for this configuration are listed above in Table I. For this simple configuration, there are 20 gatherable transaction pairs. Gathering logic 608 must perform the necessary comparisons for all gatherable pairs, adding many levels of logic (gathering logic 608) directly to the entry write enable logic 604 for transaction queue 602.
As bus widths and bus transaction sizes increase, the number of gatherable combinations increase substantially. Contemporary processors employ 16 byte wide system buses and support transfers of between 1 and 256 bytes per single bus transaction. The number of gatherable transfer combination in such a configuration is many times larger than the above simple four byte example depicted in FIG. 5 and listed in Table I. With such a large number of gatherable combinations, gathering logic 608 for detecting whether entries are gatherable becomes so large that generally either only a very small subset of complete gathering is implemented or the processor operating frequency is reduced to allow the logic depth necessary for full gathering. Both alternatives degrade processor and/or system performance.
It would be desirable, therefore, to provide a mechanism supporting full gathering in a data processing system without adding many levels of logic to the write enable logic for a transaction queue. It would further be advantageous for the gathering mechanism to not reduce processor operating frequencies.