A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as computers. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router or switch, to extend the effective “size” of the computer network and increase the number of communicating nodes. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined network communication protocols. In this context, a network communication protocol consists of a set of rules defining how the nodes interact with each other.
Each node typically comprises a number of basic systems including a processor, a main memory and an input/output (I/O) system. Data is transferred between the main memory, processor and I/O system over a system bus, while data transactions within the I/O system occur over an external bus, such as an I/O bus. Each bus typically consists of either address, data and control lines, with the control lines carrying control signals specifying the direction and type of transfer, or a pair of unidirectional communication lines for passing I/O packets containing address, data and control information, such as in the case of a HyperTransport bus. For example, the processor (i.e., a source) may issue a read transaction to request the transfer of data from an addressed location on an I/O device (i.e., a target) which is coupled to the I/O bus over the system bus. The processor then processes the retrieved data in accordance with instructions that may have been obtained from main memory. The processor may thereafter issue a write transaction requesting that the results be stored in, e.g., another addressed location in the I/O device.
Some buses operate in an “atomic” manner such that the source device is granted exclusive access (i.e., control) to the bus until the data transfer is complete. However, an atomic bus may potentially waste bus cycles, particularly when waiting for data in response to, e.g., a read request. In a split transaction bus, on the other hand, the source relinquishes control over the bus once the request is sent to the target device. After processing the request, the target may independently acquire control of the bus and return a response to the source. The split transaction bus thus essentially enables each transaction over the split transaction bus to be divided into at least two separate communications: the request and the response. For example, a read transaction over the bus may comprise a read request and a separate read response. The split transaction bus may be configured to perform both “posted” and “non-posted” transactions. A posted transaction corresponds to a request that does not solicit a response over the bus; a non-posted transaction corresponds to a request for which a response is required.
In general, each request and response transmitted over the split transaction bus is formatted in accordance with the bus's protocol. The bus protocol defines a set of rules for transmitting data packets between source and target devices interconnected by the split transaction bus. For example, the bus protocol may specify, among other things, formatting and configuration information associated with the bus. An illustrative split transaction bus protocol is the conventional HyperTransport (HPT) bus protocol, which is set forth in HyperTransport I/O Link Specification, Revision 1.10, published August 2003, and is hereby incorporated by reference.
The HPT bus protocol is often used to manage communications over a HPT bus that couples a system controller (i.e., a source device) and a forwarding engine (i.e., a target device) in an intermediate network node. By way of example, assume data is transferred between a direct memory access (DMA) engine in the source device and a central processing unit (CPU) in the target device. In this scenario, network packet data may be received by the DMA engine and forwarded over the HPT bus to the CPU. The CPU makes a forwarding determination for the received packet, modifies the packet data if necessary, then returns the processed packet data back across the bus to the DMA engine.
Traditionally, the CPU in the target device manages a “pool” of data buffers, where each buffer is typically a fixed-sized memory block. In practice, each data buffer is associated with a corresponding buffer descriptor. The buffer descriptor essentially “describes” the location and contents of its corresponding data buffer. For example, the descriptor may include, inter alia, the memory address of the buffer, the amount of data stored in the buffer, various flag values associated with the buffer, and so forth. As used herein, a “free” buffer descriptor references a data buffer that is currently not in use and is therefore available to store data.
A data path protocol is usually employed when data is transferred between the source and target devices. Conventional data path protocols define a sequence of read and write transactions that collectively define a procedure for transferring the data over, e.g., the HPT bus. In accordance with these protocols, the target device is responsible for issuing buffer descriptors to the source device whenever data is transferred across the bus. Thus, when the source device desires to transfer data to the target device, the target device issues the source device a free buffer descriptor corresponding to a data buffer available to store the transferred data. After processing the transferred data, the target device then issues another buffer descriptor to the source device indicating which buffer(s) stores the processed data. The target device is also responsible for “recycling” (i.e., reusing) descriptors whose referenced buffers are no longer in use.
A conventional HPT data path protocol defines a first sequence of read and write transactions for the source device to transfer the data to the target device and a second sequence of read and write transactions to return the processed data back to the source device. For purposes of discussion, assume the conventional data path protocol is employed for transferring data between a source device DMA engine and a target device CPU in an intermediate network node.
Previously, the following steps are performed for transferring data from the DMA engine to the CPU, i.e., in the “To-CPU” direction. First, the DMA engine initiates a read transaction across the HPT bus to retrieve one or more free buffer descriptors corresponding to data buffers available to store data in the target device. The CPU maintains a list (or queue) of free buffer descriptors. The CPU initializes descriptors in this list to indicate that they are available for the DMA engine to access. To that end, the CPU may set “ownership” flag values in the descriptors to indicate that they are available to the DMA engine. Accordingly, in response to receiving the DMA engine's read request, the CPU acquires the requested free buffer descriptor(s) whose ownership flag values indicate that they are accessible to the DMA engine. The CPU then returns the requested descriptor(s) to the DMA engine. Having received the requested descriptors, the DMA engine writes the data into the target-device data buffers referenced by the received descriptors. Then, the DMA engine updates the contents of the descriptors to coincide with the transferred data, if necessary. The DMA engine performs a write transaction over the HPT bus to return the updated descriptors back to the CPU.
After processing the transferred data, another set of read and write transactions is performed in the “From-CPU” direction. Specifically, the CPU maintains a list (or queue) of descriptors whose referenced data buffers contain processed data that may be returned to the DMA engine. The CPU sets the ownership flag values in these descriptors to indicate that they are accessible to the DMA engine. The DMA engine initiates the From-CPU data transfer by performing a read transaction across the HPT bus to retrieve one or more buffer descriptors from the head of this list. In response, the CPU forwards the requested descriptors to the DMA engine which then retrieves the processed data from the descriptors' referenced data buffers. Alternatively, the DMA engine may retrieve the data by writing a control instruction to a data mover, e.g., tightly coupled to the CPU, that effectuates the data transfer to the source device. That is, in accordance with the control instruction, the CPU's data mover transfers the data referenced by the DMA's requested descriptor(s). In either case, the DMA engine updates the contents of the buffer descriptors, if necessary, and performs a write transaction over the HPT bus to return the updated descriptors to the CPU. For instance, the DMA engine may toggle the descriptors' ownership flag values to indicate that they are now available for use by the CPU. The target device then may reuse these descriptors as free buffer descriptors in a subsequent data transfer.
The conventional data path protocol described above suffers the disadvantage of having to perform read and write transactions in both the To-CPU and From-CPU directions. Specifically, before each data transfer over the HPT bus, the source device must perform a read transaction to obtain a buffer descriptor from the target device. The source device must subsequently perform a write transaction to return the descriptor to the target device. As such, this process of exchanging buffer descriptors between the source and target devices may consume an excessive amount of the HPT bus's available bandwidth.
In addition, the conventional HPT data path protocol is limited by the inherent latencies of performing read transactions. For instance, when the target device receives a read request from the source device, the target device retrieves the requested buffer descriptor(s) and returns the requested descriptor(s) to the source device. This read transaction may consume an unreasonable amount of processing bandwidth within the target device. Moreover, because the data transfer can not be performed over the HPT bus until the read transaction is completed, i.e., the requested buffer descriptor(s) is forwarded to the source device, the latency of performing the read transaction is often a substantial portion of the overall latency of the data transfer. That is, in both the To-CPU and From-CPU directions, a substantial portion of the time consumed transferring data between the source and target devices is the time required to complete the read transaction.
There is therefore a need in the art for a data path protocol that consumes less bandwidth over a split transaction bus and reduces the latency required to transfer data between source and target devices connected to the bus. The protocol should not only consume less bandwidth over the split transaction bus, but also improve the processing bandwidth within individual devices coupled to the bus.