The present invention relates to a method and/or architecture for input/output processing generally and, more particularly, to copying chain buffers from a system memory to a local memory to accommodate large scatter-gather lists.
Conventional computers perform input/output (I/O) processing by building request messages in a host or system memory. The messages are then sent to an intelligent I/O protocol controller that performs the actual I/O data transfers. The I/O data transfers are commonly made more efficient by implementing small block I/O messages. As a result, some request messages cannot contain all of the data to be transferred.
A request message may have an associated scatter-gather (SG) list to permit the request message to transfer one or more buffers of data. The SG list is conventionally stored in one or more chain buffers linked to the request message if the SG list does not fit into the request message. Each chain buffer is a SG segment. Each SG segment contains one or more SG elements. Each SG element points to a data buffer in the system memory containing the data to be transferred. A SG element may contain an address and a length of the data buffer. An I/O protocol controller has two options in the event that chain buffers are required for an I/O operation. The I/O protocol controller may control direct memory access (DMA) operations based on the SG elements stored in the system memory. Alternatively, the I/O protocol controller may copy the entire chain buffer(s) into a local memory and execute DMA operations based on the copy.
Referring to FIG. 1, a depiction of request messages 10, 12, 14 and 16, associated chain buffers 18, 20, 22 and 24, and reply messages 26 and 28 is illustrated. The request messages 10, 14 and 16 may require use of chain buffers 18-24, while the request message 12 may not. If the request message 10 requires a chain buffer, then the request message 10 will contain a pointer 30 that identifies a particular chain buffer 18. The chain buffer 18 is shown having another pointer 32 that links the chain buffer 18 to the chain buffer 20.
Conventional I/O protocol controllers operate on a single SG element at a time. A conventional chain buffer can easily accommodate up to ten simple SG elements. To access a SG element within a chain buffer residing in the system memory, the I/O protocol controller must incur a latency associated with accessing data across a shared system bus. Furthermore, each access to a SG element reduces an available bandwidth of the shared bus, reducing overall system performance.
To reduce the shared system bus utilization and reduce the latency associated with accessing the SG elements, entire chain buffers can be copied to the local memory using a single DMA operation. Copying the entire chain buffer is desirable because the shared system bus utilization is more efficient (i.e., approximately ten SG elements can be bursted into the local memory using a single shared system bus transaction) and subsequent SG element access latency is reduced. However, copying the chain buffers to the local memory introduces new issues. First, the DMA operation is typically controlled via an on-chip I/O processor. The copy task reduces the bandwidth that the I/O processor has available for other operations. Secondly, the I/O operation involving the chain buffers cannot be initiated until the chain buffers have been copied locally.
The present invention concerns a controller generally comprising a DMA engine, a processor, and a circuit. The DMA engine may be configured to copy from a system memory to a local memory. The processor may be configured to process a message written in the local memory. The circuit may operate independently of the processor. The circuit may be configured to (i) monitor writes to the local memory for the message having a first pointer and (ii) program the DMA engine to copy a first buffer identified by the first pointer in response to the first pointer having a non-null value.
The objects, features and advantages of the present invention include providing a circuit to direct copying of chain buffers to a local memory that may (i) save work required of an I/O processor, (ii) permit I/O processing to start before all of the chain buffers are copied and/or (iii) permit I/O data transfers to start before all of the chain buffers are copied.