Input/output processing concerns the movement of data to/from devices, e.g., nonvolatile storage devices such as an optical disk, fixed magnetic disk or floppy magnetic disk, that are external to a processor complex. Originally, input/output processing was handled by the processor complex. FIG. 1 is a hardware diagram corresponding to this situation.
In FIG. 1, a processor complex 102 is connected to input/output adapters 106 by an input/output bus 104. Load/store commands and interrupt signals 108 are exchanged between the processor complex 102 and the input/output adapters 106. An input/output adapter 106 connects an input/output device (not shown) to the input/output bus 104. The processor complex 102 typically includes a processor (not shown), a memory controller (not shown) and a bus controller (not shown). The bus controller typically generates and manages communication over the input/output bus. In particular, the bus controller handles interrupt management, e.g., by providing a mapping from a physical input/output bus slot to an interrupt bit.
FIG. 2 depicts a functionality diagram corresponding to FIG. 1. The device driver functionality 204, the protocol stack functionality 206, the application functionality 208 and the operating system services 210 are performed by the processor complex 102, as indicated by the dashed box 202. The input/output adaption functionality 212 is performed by an input/output adapter 106.
In input/output request processing data flow path 216 has been depicted between the application functionality 208 and the input/output adaption functionality 212. An input/output request is initiated, either directly or indirectly, by the application 208. This input/output request is processed by the protocol stack 206, which converts the generic input/output request of the application 208 into a specific command protocol for a peripheral device, such as disk memory or a communications link such as TCP/IP. The protocol stack 206 may use various services that are provided by the operating system 210.
In a system with no external input/output processing such as in FIG. 1, the protocol stack 206 queries the operating system 210 for a linkage to the device driver 204. Once this linkage is obtained, the protocol stack 206 directly calls the services provided by the device driver 204.
The device driver 204 is responsible for accepting a command from the protocol stack 206 and instructing the input/output adaption functionality 212, i.e., the input/output adapter 106, to perform the command. The device driver 204 has direct access to all of the registers in the input/output adapter 106 and directly loads data from or stores data to the register space (not depicted) of the adapter 106.
The situation depicted in FIGS. 1 and 2 is typical for personal computers (PCs). The input/output adapter 106 is totally managed by the processor complex 102, including programming the input/output adapter 106, using loads and stores, and responding to service requests from the input/output adapter 106 by way of either an interrupt or polling technique. Such programming and responding has been indicated via the signal paths 108.
Previously, the disparity between the processor complex cycle time and the input/output bus speed was small. If the processor complex had to wait for an input/output adapter 106 to respond to a load or store command, the wait was not very long, resulting in the processor complex 102 being stalled or unusable for only a few cycles.
As technology has progressed, processor complex cycle times have decreased to a much greater extent than input/output adapter response times. Consequently, the number of processor complex cycle times that were lost, due to being stalled while waiting for an input/output adapter to respond to a load or store command, grew as quickly as the processing speed of the processor complex.
As an example of the processor complex being stalled, consider a peripheral computer interface (PCI) input/output transaction on a local PCI bus for which the latency is 300 nanoseconds (nsec), and a processor cycle time of three nsec. In this situation, the processor will be stalled for 100 cycles to perform the input/output transaction. If the processor cycle time is decreased to one nsec, then the processor complex will be stalled for 300 cycles. As another example, in the case of a PCI input/output transaction on a remote PCI bus connected to a host PCI bus via a bridge for which the latency is two microseconds (u sec) and the processors' complex cycle time is three nsec, the processor complex suffers 666 wasted cycles. If the processor cycle time is decreased to one nsec, then the processor complex suffers 2000 wasted cycles.
To reduce the time that a processor complex was stalled due to an input/output command, the processor complex was programmed to perform other functions after issuing an input/output command. When the input/output adapter 106 finally responded, it regained the attention of the processor complex 102 by providing an interrupt signal. To service the interrupt, it was necessary for the processor complex to store its internal states concerning the process it was currently executing. Typically, three or four load/store commands were associated with an interrupt, and three or four interrupts were associated with each input/output command. Thus, though the technique of using interrupts solved the problem of the stalled processor complex, much useful work by the processor complex was consumed by the interrupt service routines that had to be executed.
To solve the problem of the processor complex having to service many interrupts, the responsibilities for performing the device driver functionality and servicing the interrupts from an input/output adapter were transferred to an input/output processor external to the processor complex. This situation is depicted in FIG. 3, where a processor complex 302 is connected to an input/output bus 304. An input/output processor 310 as well as input/output adapters 306 are also connected to the input/output bus 304. The processor complex 302 typically includes a processor (not shown), a memory controller (not shown) and a bus controller (not shown). The bus controller generates and manages the input/output bus 304, including providing a mapping from a physical input/output bus slot to an interrupt bit.
FIG. 4 is a functionality diagram corresponding to FIG. 3. The functions performed by the processor complex 302, as denoted by the dashed box 402, now only include the operating system services 406, the protocol stack 408 and the application 410. The device driver functionality 416 has been moved outside the processor complex 302 to the input/output processor 310, as is indicated by the dashed box 404, which also includes the input/output operating system services functionality 414. The processor complex functionalities 402 communicate with the input/output processor functionalities 404 via a message protocol 412. The input/output processor functionalities 404 communicate with the input/output adaption functionality 418 via an exchange of load/store commands and interrupts, as denoted by item 420.
As before, an input/output request processing data flow path 422 has been depicted between the application functionality 410 and the input/output adaption functionality 418. An input/output request is initiated, either directly or indirectly, by the application 410. This input/output request is processed by the protocol stack 408, which converts the generic input/output request into a specific command protocol for the peripheral device, such as a disk drive storage or a communications link, e.g., TCP/IP. The protocol stack 408 may use various services that are provided by the operating system 406.
The protocol stack 408 queries the operating system 406 for a connection to the device driver 416. This connection will permit command and response messages to flow between the protocol stack 408 and the device driver 416. Once this connection is established, the protocol stack 408 sends command messages to the device driver 416 via the operating system services 406.
The operating system services 402 transfer and receive the command and response messages from the protocol stack 408 to the input/output operating system services 414. The input/output operating system services receive and transfer these messages to or from the device driver 416. For every operation, both the operating system's services 406 and the input/output operating system services 414 are used to communicate the command and response.
The device driver 416 is responsible for accepting a command from the protocol stack 408 and instructing the input/output adapter 306 to perform the command. Typically, the device driver 416 has direct access to all of the registers (not shown) in an input/output adapter 306 and directly loads or stores data, i.e., reads or writes data, to the register space of the input/output adapter 306. Alternatively, part of the protocol stack 408 may be implemented as part of the input/output operating system services.
FIGS. 3-4 are typical of a PC server. The input/output processor 310 has been added to offload the control of the input/output adapter 306 from the processor complex 302. Such offloading is represented by the signal paths 308 and 312. The signal path 308 represents the exchange of load/store commands and interrupt information according to a message protocol between the processor complex 302 and the input/output processor 310. The signal path 312 represents the issuing of load/store commands and the responses in the form of interrupts between the input/output processor 310 and the input/output adapter 306. As an optional aspect, the processor complex 302 can retain the device driver functionality, as in FIGS. 1-2, and communicate directly with the input/output adapter 306, as represented by the signal path 314.
When this architecture was first being used, the input/output processor 310 serviced the three or four interrupts from an input/output adapter 306 associated with each input/output command. In turn, the input/output processor 310 generated only one interrupt to the processor complex 302 per input/output command. Thus, much useful work of the process complex 302 was no longer lost to the servicing of the other two or three interrupts.
Another trend in input/output processing has been for input/output adapters to generate one interrupt, rather than three or four interrupts, per input/output command. Consequently, an input/output processor must only service one interrupt request from an input/output adapter. The input/output processor 310 must then issue its own interrupt request to the processor complex 302. Thus, it is no longer necessarily true that the processor complex 302 is servicing fewer interrupts than the input/output processor 310. The benefit to using input/output processors has now become their ability to decouple the processor complex from the latencies associated with issuing the load/store commands to the input/output adapters and waiting for the typically single interrupt request response from the input/output adapter.
As is typical in the field of computer technology, changes in input/output adapter technology occur quickly. Such a change might be an increase in bandwidths of an Ethernet card from 10 megabits to 100 megabits. To respond to such a change, both the processor complex and the input/output processor must be adapted. Thus, two separate revisions must be designed and supported. Most likely, two separate sets of development tools, such as compilers, debuggers, etc., also must be developed. Such dual development is expensive.