Commonly assigned, copending U.S. patent application Ser. No. 07/818,566, now abandoned of Chmielecki, Jr. et al. entitled "Apparatus and Method for Transferring Data To and From Host System", is incorporated in its entirety herein by reference.
In a data processing system, communication adapters and I/O controllers are provided to transfer data between a host system and a network or peripheral device. A memory and processor of the host system are coupled to the adapter or controller by a system bus.
As the central processing units in host systems have become faster, the difference between the speed of processor operations involving local registers or cache memory and the speed of accesses between the processor of the host system and main memory or peripheral devices has been magnified. As a result, in order for data processing systems to better utilize the faster processors, there is a need to provide more efficient methods of transferring data between the host system and networks or peripheral devices.
In order to improve performance, most high performance adapters and controllers make use of direct memory access (DMA) to transfer data. However, the use of DMA itself does not guarantee high performance. A high performance adapter or controller should minimize the amount of time a system bus is used during transfer of data between the adapter or controller and the host system, should reduce the amount of work that must be performed by the host system, and also should provide an efficient implementation of DMA.
One model for DMA transfers between host memory and I/O controllers is described by H. Michael Wenzel for the IEEE P1212 CSR Architecture Specification in "CSR Architecture (DMA Framework): Recommended DMA Architectures," Part III-A, Draft 1.38 (Jul. 5, 1990), which is herein incorporated by reference.
In this model, which may be used for transfers involving such system buses as Futurebus+, SCI, and SerialBus, circular queues are provided for communicating information between the adapter and the processor in the host system. A circular queue is a software array structure of message storage locations. The items in circular queues are accessed in first-in, first-out order, and a particular circular queue will be used to pass messages in only one direction, from a single producer of the queue item to a specific consumer of the queue item.
Each circular queue is associated with two separate indices, i.e., a producer index and a consumer index. The producer index points to a selected item in the circular queue that has been or will be written ("produced"). The consumer index points to a selected item in the circular queue that has been or will be read ("consumed"). As items are added to and subsequently removed from the queue, the consumer index will continually chase the producer index around the circular array of items.
A circular queue may be used as a bus interface between a host memory in a host system and a communication adapter or I/O controller. Preferably, the entire circular queue is located in physically contiguous storage locations in the host memory. The communication adapter or I/O controller is provided with a set of control fields in its memory. The control fields describe the location of the circular queues located in host memory, and the value of the producer index and/or the consumer index.
A feature of the circular queue model is the ability to obtain access to and transfer items in blocks. For example, if the consumer is a communication adapter or I/O controller that falls far behind a processor that functions as the producer, a comparison of the producer and consumer indices will indicate how many queue items to transfer to memory in the communication adapter or I/O controller in one block read. Simply by reading the producer and consumer indices and comparing them, the producer determines how many empty locations are currently available in the queue, and the consumer determines how many full locations have not yet been read.
With respect to transmit operations in which data is read from the host memory for transmission on the network, the host memory data is transferred, under control of the host processor or by DMA, to a packet memory associated with the adapter. In order to facilitate the efficient transmission of the data, the data transferred from the host memory is stored in the packet memory in a predetermined format suitable for the particular type of network. For example, the data may be stored in the packet memory in longword format for subsequent formation of packets and transmission onto the network. As used herein, a longword format is one consisting of a predetermined number of bytes, e.g., four bytes. The predetermined number is not less than two.
In prior art systems, the adapter includes a first-in-first-out memory (FIFO) coupled between the host memory and packet memory to allow for differences in speed of access to the host and packet memory, and to allow for latency corresponding to delays in gaining access to the host or packet memories. Therefore, the host processor, or a DMA engine that drives the transfer of data from the host memory to the packet memory, first stores longwords of data from the host memory in the FIFO, and then transfers the stored longword data from the FIFO to the packet memory.
There are several constraints on the transfer from the host memory to the FIFO that result in added circuit complexity and reduction of transfer speed in prior systems. First, while data in the host memory is typically byte addressable, the data is only retrievable in longword format. That is, although a byte address can be applied to the host memory, a read operation will only cause retrieval of the longword containing the addressed byte. Second, it is likely the bytes of data stored in the host memory and identified by the host system for transmission are not stored contiguously in the host memory. As a result of these constraints on the accessibility of bytes identified for transmission, the four bytes of a longword actually retrieved from the host memory may not all be valid, i.e., not all be intended for transmission.
It is noted, however, that in accordance with known practices in the art, valid bytes are contiguous in the retrieved longword, i.e., stored at successive memory locations, so that the retrieved longword contains one to four valid contiguous bytes. The location of the valid bytes within the longword are identified by bits appended to the host memory address information provided by the host system.
It is therefore necessary to align the valid bytes of data contained in the longwords retrieved from the host, in order to form longwords containing valid bytes for storage in the longword addressable FIFO. FIG. 1 illustrates a block diagram of a conventional circuit 100 for aligning and storing longwords composed entirely of valid bytes in a longword addressable FIFO 102. Circuit 100 includes a seven-byte shifter 104 coupled to receive a four-byte longword retrieved from the host memory, and to provide a seven-byte wide output in which the received four bytes have been shifted to a desired position. The shifting of data by shifter 104 is performed in accordance with a control signal received from a control circuit 106.
The shifted four bytes of the retrieved longword are stored in a seven byte wide holding register 108. A selector circuit 110 is connected to the seven byte positions of register 108, provides to FIFO 102 four valid contiguous bytes forming a longword from the data stored in register 108. A state machine 112 is connected to control the shifter 104 through control circuit 106, register 108, and selector 110. State machine 112 also controls the storage of longwords into FIFO 102 and the value of write pointer 114 which addresses the next longword location in FIFO 102.
State machine 112 in turn receives relevant control information for controlling the transfer of longword data from the host memory to the packet memory either directly from the host system or from a DMA engine. Control circuit 106 also receives address information from the host system identifying the location of valid bytes in the longword currently retrieved from the host memory. Based on the received address and control information, state machine 112 controls control circuit 106 and register 108 so that the valid contiguous bytes in each longword received from the host memory are aligned and stored in the next available positions in register 108. While invalid bytes included in a longword retrieved from the host memory may also be stored, they are overwritten in register 108 during subsequent storage operations. In this manner, state machine 112 controls the alignment and storage in register 108 of valid bytes successively retrieved from the host memory until register 108 holds at least four valid contiguous bytes, thereby forming a longword composed entirely of valid bytes.
Although a "valid" longword contains four valid bytes, it is necessary to provide register 108 with a width of seven bytes to accommodate the situation in which, after having accumulated three valid contiguous bytes in register 108, the next retrieved longword is entirely valid, i.e., it contains four valid bytes, thereby necessitating the simultaneous storage of seven valid bytes in register 108.
When register 108 holds at least four valid contiguous bytes, state machine 112 controls selector 110 to apply to FIFO 102 the first four valid bytes stored in register 108, which constitute a valid longword. The applied longword is stored in the next available longword position in FIFO 102.
Conventional circuit 100 illustrated in FIG. 1 is disadvantageous in that it requires provision of an additional storage element, i.e., register 108, to enable interim storage of bytes until a valid four-byte longword is formed. Circuit 100 is also disadvantageous in that register 108 must be provided with a width in excess of the actual longword size, and there is an increase in circuit complexity associated with controlling a register having an excess width. There is a further increase in complexity resulting from circuit 100 containing two interfaces with state machine 112, a first interface between state machine 112 and elements 102-110 and a second interface between the host system and the state machine. Furthermore, the inclusion of the additional register 108 between FIFO 102 and shifter 104 to form a valid longword, adds one or more clock cycles of delay in the data path.
While the above described conventional technique for storing valid longwords in the FIFO is implemented using hardware, another conventional technique is instead implemented in host software. In accordance with that technique, the host software causes the host to perform the additional steps of allocating dedicated valid longword buffers in host memory and copying valid bytes into those buffers to form valid longwords. These buffered valid longwords are then transferred to the FIFO. Disadvantageously, the additional storage steps performed by the host delay host operations.