As transistors shrink and die sizes grow, more and more digital-logic system components traditionally implemented as separate chips in discrete packages are being implemented together on a single chip (a so-called “system on a chip” or “SoC”). Internal buses on the SoC connect the various internal components; unlike traditional, off-chip buses, the on-chip buses need not be the bandwidth-limiting factor in communication between components. For example, while it may be expensive in resources, area, and power to double the bandwidth of an (e.g.) off-chip printed-circuit-board bus, it may be comparatively cheap to do so for an on-chip bus. Furthermore, the less-severe crosstalk, reflections, and/or other noise on-chip buses are exposed to may make it easier to run on-chip buses at higher frequencies (e.g., at the same clock frequencies at which the SoC components themselves run). Special care must be taken, however, to maximize the benefits of the advantages presented by on-chip buses.
Out-of-order execution of transactions received by a shared resource is one way to increase the efficiency of on-chip buses. For example, a memory (an example of a bus “slave”) may be shared by two on-chip processors (examples of bus “masters”). The throughput to and from one master may be relatively high and the throughput to the other master may be relatively low (due to any one of many design factors and considerations). In this case, a long series of transactions between the slave and the slow master may disadvantageously delay a later-received transaction between the slave and the fast master (the “fast” transaction may be received after all the “slow” transactions have been received, but are still executing, or may be received during receipt of—or “interleaved” with—the slow transactions). By allowing transactions to execute out-of-order, the slow transactions may be temporarily suspended so that the fast transaction may execute. The increase in total execution time for the slow transactions may be negligible, while the fast transaction avoids a potentially significant delay.
One example of a protocol that supports out-of-order execution is known as the Advanced Microcontroller Bus Architecture (“AMBA”), and specifically an aspect of it called multi-layer Advanced Extensible Interface, or “multi-layer AXI.” Multi-layer AXI is an architecture capable of providing the maximum bandwidth between each of the masters and the slaves in a system while requiring only a routing density comparable to that of the SoC components. Every connection in a multi-layer AXI system looks like, and behaves like, a direct master-slave connection; existing peripheral and sub-systems (e.g., those not programmed for the advanced features of multi-layer AXI) may thus be compatibly connected via the architecture. One aspect of multi-layer AXI that enables these features is the association of an identification (“ID”) tag with each bus transaction; transactions having the same IDs have internal dependencies and must be completed in order, while transactions having different IDs may be completed in any order. Multi-layer AXI also supports write-data interleaving, in which groups of write data transactions from two or more masters are received, at a slave, interspersed with each other; the slave tracks and maintains the original sources of the transactions and honors any dependencies therebetween.
Any efficient implementation of an SoC bus protocol like multi-layer AXI, if it accommodates out-of-order execution, must therefore account for the design challenges that groups of in-order transactions and/or data interleaving present. Existing designs may use first-in-first-out (“FIFO”) and/or simple buffers to capture bus transaction requests as they are received at a slave, but these designs require sophisticated control logic to account for, and properly deal with, the mixture of in-order and out-of-order transactions as well as control logic to de-interleave received data. These implementations are thus large, inefficient, and power-hungry; a need therefore exists for a small, elegant, low-power implementation.