The Advanced Microcontroller Bus Architecture (AMBA) and Advanced eXtensible Interface (AXI) protocol are described in the AMBA AXI and ACE Protocol Specification Issue E (incorporated in its entirety by reference). This document describes the following: the AMBA 3 AXI protocol (referred to as AXI3); the AMBA 4 AXI protocol (referred to as AXI4 and AXI4-LITE); and the AMBA 4 protocol (referred to as ACE and ACE-LITE).
The AMBA and AXI protocols are used in many modern communication devices such as smart phones and tablets. The AMBA AXI protocol supports high performance, high frequency system designs.
The AXI protocol: a) is suitable for high-bandwidth and low-latency designs; b) provides high-frequency operation without using complex bridges; c) meets the interface requirements of a wide range of components; d) is suitable for memory controllers with high initial access latency; e) provides flexibility in the implementation of interconnect architectures; f) is backward-compatible with existing AHB and APB interfaces. The key features of the AXI protocol are: a) separate address/control and data phases; b) support for unaligned data transfers, c) using byte strobes; d) uses burst-based transactions with only the start address issued; e) separate read and write data channels, that can provide low-cost Direct Memory Access (DMA); f) support for issuing multiple outstanding addresses; g) support for out-of-order transaction completion; and h) permits easy addition of register stages to provide timing closure. The AXI protocol includes the optional extensions that cover signaling for low-power operation. The AXI protocol includes the AXI4-Lite specification, a subset of AXI4 for communication with simpler control register style interfaces within components.
The AXI protocol is burst-based and defines the following independent transaction channels: read address; read data; write address; write data; write response. An address channel carries control information that describes the nature of the data to be transferred. The data is transferred between master and slave using either: A write data channel to transfer data from the master to the slave (here, the slave uses the write response channel to signal the completion of the transfer to the master; and A read data channel to transfer data from the slave to the master. The AXI protocol: permits address information to be issued ahead of the actual data transfer; supports multiple outstanding transactions; and supports out-of-order completion of transactions.
FIG. 1A shows how a read transaction uses the read address and read data channels. Here, a Master Interface (101) sends address and control information to Slave Interface (102) via a read address channel (103). Corresponding responses (read data) are sent from the Slave Interface (102) to the Master Interface (101) via read data channel (104).
FIG. 1B shows how a write transaction uses the write address, write data, and write response channels. Here, a Master Interface (201) sends address and control information to Slave Interface (202) via a write address channel (203). Thereafter, corresponding write data is sent from the Master Interface (201) to the Slave Interface (202) via a write data channel (204). A corresponding write response is then sent from the Slave Interface (202) to the Master Interface (201) via a write response channel (205).
In FIGS. 1A and 1B, each of the independent channels consists of a set of information signals and VALID and READY signals that provide a two-way handshake mechanism. The information source uses the VALID signal to show when valid address, data or control information is available on the channel. The destination uses the READY signal to show when it can accept the information. Both the read data channel and the write data channel also include a LAST signal to indicate the transfer of the final data item in a transaction. Read and write transactions each have their own address channel. The appropriate address channel carries all of the required address and control information for a transaction.
The read data channel carries both the read data and the read response information from the slave to the master, and includes: a) the data bus, that can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide; and b) a read response signal indicating the completion status of the read transaction.
The write data channel carries the write data from the master to the slave and includes: a) the data bus, that can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide; and b) a byte lane strobe signal for every eight data bits, indicating which bytes of the data are valid. Write data channel information is always treated as buffered, so that the master can perform write transactions without slave acknowledgement of previous write transactions.
A slave uses the write response channel to respond to write transactions. All write transactions require completion signaling on the write response channel. As shown in FIG. 1B, completion is signaled only for a complete transaction, not for each data transfer in a transaction.
FIG. 1C shows an AXI system that includes a number of master (401) and slave devices (402) connected together through some form of interconnect (403). Here, the AXI protocol provides a single interface definition, for the interfaces: a) between a master and the interconnect; b) between a slave and the interconnect; and c) between a master and a slave. This interface definition supports a variety of different interconnect implementations. (Note: An interconnect between devices is equivalent to another device with symmetrical master and slave ports to which real master and slave devices can be connected.)
Most systems use one of three interconnect topologies: a) shared address and data buses; b) shared address buses and multiple data buses; and/or c) multilayer, with multiple address and data buses. In most systems, the address channel bandwidth requirement is significantly less than the data channel bandwidth requirement. Such systems can achieve a good balance between system performance and interconnect complexity by using a shared address bus with multiple data buses to enable parallel data transfers.
Each AXI channel transfers information in only one direction, and the architecture does not require any fixed relationship between the channels. This means a register slice can be inserted at almost any point in any channel, at the cost of an additional cycle of latency. This makes possible: a) a trade-off between cycles of latency and maximum frequency of operation; and b) a direct, fast connection between a processor and high performance memory.
All AXI transaction channels use a common VALID/READY handshake process to transfer address, data, and control information. This two-way flow control mechanism means both the master and slave can control the rate at which the information moves between master and slave. The source generates the VALID signal to indicate when the address, data or control information is available. The destination generates the READY signal to indicate that it can accept the information. Transfer occurs only when both the VALID and READY signals are HIGH.
The AXI protocol requires the following relationships to be maintained: a) a write response must always follow the last write transfer in the write transaction of which it is a part; b) read data must always follow the address to which the data relates; c) channel handshakes must conform to the pre-defined dependencies. Otherwise, the protocol does not define any relationship between the channels. This means, for example, that the write data can appear at an interface before the write address for the transaction. This can occur if the write address channel contains more register stages than the write data channel. Similarly, the write data might appear in the same cycle as the address.
When an AXI master initiates an AXI operation, targeting an AXI slave: a) the complete set of required operations on the AXI bus form the AXI Transaction; b) any required payload data is transferred as an AXI Burst; and c) a burst can comprise multiple data transfers, or AXI Beats.
In the protocol, there are 12 memory types: 1) Device Non-bufferable; 2) Device Bufferable; 3) Normal Non-cacheable Non-bufferable; 4) Normal Non-cacheable Bufferable; 5) Write-through No-allocate; 6) Write-through Read-allocate; 7) Write-through Write-allocate; 8) Write-through Read and Write-allocate; 9) Write-back No-allocate; 10) Write-back Read-allocate; 11) Write-back Write-allocate; and 12) Write-back Read and Write-allocate. Each memory type operates according to standard-defined rules. Also, the same memory type can have different encodings on the read channel and write channel.
Write accesses to the following memory types do not require a transaction response from the final destination, but do require that write transactions are made visible at the final destination In a timely manner: a) Device Bufferable; b) Normal Non-cacheable Bufferable; and c) Write-through. For write transactions, all three memory types require the same behavior. For read transactions, the required behavior is as follows: a) for Device Bufferable memory, read data must be obtained from the final destination; b) for Normal Non-cacheable Bufferable memory, read data must be obtained either from the final destination or from a write transaction that is progressing to its final destination; and c) for Write-through memory, read data can be obtained from an intermediate cached copy. In addition to ensuring that write transactions progress towards their final destination in a timely manner, intermediate buffers must behave as follows: a) An intermediate buffer that can respond to a transaction must ensure that, over time, any read transaction to Normal Non-cacheable Bufferable propagates towards its destination. This means that, when forwarding a read transaction, the attempted forwarding must not continue indefinitely, and any data used for forwarding must not persist indefinitely. The protocol does not define any mechanism for determining how long data used for forwarding a read transaction can persist. However, in such a mechanism, the act of reading the data must not reset the data timeout period; and b) An intermediate buffer that can hold and merge write transactions must ensure that transactions do not remain in its buffer indefinitely. For example, merging write transactions must not reset the mechanism that determines when a write is drained towards its final destination.
Regarding buffers for data transactions, the specification supports the combined use of Device Non-buffered and Device Buffered memory types to force write transactions to reach their final destination and ensure that the issuing master knows when the transaction is visible to all other masters. A write transaction that is marked as Device Buffered is required to reach its final destination in a timely manner. However, the write response for the transaction can be signaled by an intermediate buffer. Therefore, the issuing master cannot know when the write is visible to all other masters. If a master issues a Device Buffered write transaction, or stream of write transactions, followed by a Device Non-buffered write transaction, and all transactions use the same AXI ID, the AXI ordering requirements force all of the Device Buffered write transactions to reach the final destination before a response is given to the Device Non-buffered transaction. Therefore, the response to the Device Non-buffered transaction indicates that all the transactions are visible to all masters.
Regarding transaction ordering, a master can use an AWID (write address ID) and ARID (read address ID) transaction IDs to indicate its ordering requirements. The rules for the ordering of transactions are as follows: a) Transactions from different masters have no ordering restrictions. They can complete in any order. b) Transactions from the same master, but with different ID values, have no ordering restrictions. They can complete in any order. c) The data transfers for a sequence of read transactions with the same ARID value must be returned in the order in which the master issued the addresses. d) The data transfers for a sequence of write transactions with the same AWID value must complete in the order in which the master issued the addresses. e) There are no ordering restrictions between read and write transactions using a common value for AWID and ARID. f) Interconnect use of transaction identifiers correspond to how the AXI fabric extends the transaction ID values issued by AXI masters and slaves.
At a master interface, read data from transactions with the same ARID value must arrive in the order in which the master issued the addresses. Data from read transactions with different ARID values can arrive in any order. Read data of transactions with different ARID values can be interleaved. A slave must return read data for a sequence of transactions with the same ARID value in the order in which it received the addresses. In a sequence of read transactions with different ARID values, the slave can return the read data in any order, regardless of the order in which the transactions arrived. The slave must ensure that the RID value of any returned data matches the ARID value of the address to which it is responding. The interconnect must ensure that the read data from a sequence of transactions with the same ARID value targeting different slaves is received by the master in the order in which it issued the addresses. The read data re-ordering depth is the number of addresses pending in the slave that can be reordered. A slave that processes all transactions in order has a read data re-ordering depth of one. The read data re-ordering depth is a static value that must be specified by the designer of the slave.
FIGS. 2A-2B show a conceptual diagram for how conventional memory retrieval is performed in an AXI compliant environment. Here, a master device (1) has a master request port (1A) and a master response port (1B). Data is stored in a memory (7) (here, Dynamic Random Access Memory (DRAM)). Memory access is cooperatively managed by a Last Level Cache (LLC) (5) and a Re-ordering Buffer (ROB) (3). In FIG. 2A, requests and responses are routed via the ROB (3). In FIG. 2B, only responses are routed via the ROB (3). However, as seen in FIG. 2B, copies of the requests are sent to the ROB so that the ROB may perform bookkeeping operations to ensure that the responses are properly ordered. Also, FIGS. 2A and 2B show optional bypasses from the DRAM directly to the ROB. The optional bypass path is used for requests and/or responses that the system does not intend to place in the LLC. However, these requests/responses are still subject to the above-described ordering.
The term LLC stands for Last Level Cache. This term LLC denotes that the cache is the last caching agent in the system before memory (DRAM). In the current art, most systems have L1/L2/L3 caches. The “L-number” denotes the proximity to the master which can either be a CPU or GPU. In any system, the LLC is always the last caching agent and with the largest number in the “L-number”.
As seen in FIG. 2C, the conventional ROB receives data requests from the Master device (S1). These requests are forwarded to the DRAM via the LLC. The ROB then receives un-ordered responses from the DRAM via the LLC (S3). The ROB determines whether or not response(s) can sent to Master in correct order (S5). Here, the ROB may use one or more criteria to determine whether or not response(s) can sent to Master in correct order. In general, the AXI ordering requirement of the Master is used to determine whether a request can be sent to master. However, other criteria may be used for multiple responses that each satisfy AXI ordering requirement of the Master. The criteria may be based on age of request, priority of request, ROB buffer capacity, and/or other parameters.
If the ROB can send the response(s) in the correct order, the ROB sends them (S7). However, if the ROB cannot send the response(s) in the correct order, the ROB internally buffers the response(s) until responses can be properly ordered (S9). Later, after the ROB determines that specific unordered response(s) within the ROB can now be correctly ordered and sent to Master (S11), the ROB sends the specific response(s) to Master in the proper order (S13).
In other conventional approaches, as seen in FIG. 3A, the Last Level Cache (LLC) may include multiple banks that manage data requests and responses. Here, request(s) from the Master are routed to individual LLC banks via a predefined protocol (1). The banks then send the requests to the DRAM (2). The LLC banks then receive the responses from the DRAM (3). The LLC banks then send unordered responses to the Master via the ROB for subsequent ordering (4).
Thus, as seen in FIG. 3B, the conventional LLC receives request(s) from Master directly or via Re-ordering Buffer (S31). Each request is assigned to specific LLC bank according to predetermined criteria/protocol (S33). An exemplary specific criterion would be by address, where by each bank owns a portion of the total address space. Each bank forwards request to DRAM in predetermined (e.g., FIFO) order without coordination between banks (S35). Each bank receives corresponding response from DRAM at random time (S37). To optimize performance, the requests are processed in some optimal order unrelated to send or receive order. Modern DRAM requires intelligent out of order processing to maintain bandwidth and latency. Each bank forwards response to Re-ordering buffer in predetermined order without coordination between banks (S39).
With ever growing increases in smart phone/tablet complexity, the size, speed, complexity and number of memory accesses continues to grow. This growth has led to increased demands on (and growth in size of) respective buffers.
Modern caches tend to be based on static random access (SRAM) technology, whereas AXI-compliant buffers tend to be based on flip-flop or other non-SRAM technology. Thus, buffers tend to require more transistors per each stored bit. Buffers therefore tend to require more power and generate more heat than corresponding memory units. Larger, faster and more complex data demands has resulted in growth in re-ordering buffer size (i.e., more transistors), and therefore increased buffer power and circuit cooling.