Computer Input/Output (I/O) protocols govern communications between computer operating system programs and computer I/O devices, such as I/O devices that provide disk storage, communications or network capabilities. Conventionally, these I/O protocols are based upon command and response messages that are exchanged with an 1/O bus that interconnects the computer central processor and memory to I/O adapters or I/O processors (these latter being a form of I/O adapter that is distinguished as having more complex functions, usually in support of the operating system programs, than the general class of "I/O adapter").
In such conventional I/O protocols, operating system device driver programs create command messages which are then transmitted across an I/O bus to the I/O adapter. The adapter interprets the command and performs the requested operation. Usually, this operation includes the transfer of data between an I/O device and the computer memory across the I/O bus. Such data transfers are typically performed by using known direct memory access (DMA) mechanisms that are part of the I/O bus functions.
When the I/O adapter has completed the requested operation, it creates a response message which is then transmitted back to the computer memory where the operating system and device driver programs can then interpret that response and conclude the overall I/O operation.
The execution of I/O commands by an I/O adapter typically requires a time duration that is many thousands, or even millions, of central processor instruction cycles. Thus, while the I/O adapter is performing a command, the device driver and computer operating system normally perform other work and are not dedicated strictly to waiting for the I/O adapter to complete the command and forward the response message. Rather, the typical device driver and operating system rely upon an asynchronous event indication, such as a processor interrupt, to signal that the I/O adapter has completed the command and that the response message is available for the operating system and device driver to interpret.
The relative timing and frequency of such processor interrupts signalling that an I/O command has been completed have significant effects on the overall utilization of the central processor, utilization of the I/O device and its data throughput capabilities, and overall system performance. Such utilization is also affected by I/O command latency, or the duration of an I/O operation as seen by the programs that depend upon that I/O operation to complete their functions.
For example, in a large high performance processor system, the latency for an I/O memory read across a PCI (Peripheral Component Interconnect) standard bus may require many, many processor cycles which seriously degrades execution speed of a program depending upon that I/O memory read.
More particularly, a high performance processor attempting to do a single memory read of one word (4 bytes) of data from a PCI device may experience a latency to complete that memory read of about 80 to 200 processor cycles. Servicing an interrupt typically requires multiple loads across the PCI bus resulting in latencies approaching a thousand processor cycles.
The PCI local bus specification utilizes a mechanism that potentially alleviates some of these inefficiencies due to I/O latencies. This mechanism sets target latencies which are not to be exceeded by the master (eg. host system), the bus arbitrator or the target (eg. I/O device) to help limit the time in which the system must wait for responses.
However, in practice, the PCI bus has a minimum latency based on its cycle time which is currently on the order of 33 to 66 MHz, so there are still guaranteed minimum latencies of several hundred nanoseconds. Furthermore, the maximum, target latencies that the PCI standard would expect are typically on the order of several hundred to several thousand nanoseconds.
Potentially, for a slow I/O device that maximum latency could even realistically be upwards of a millisecond or several milliseconds. The consequence to a high performance processor running with, for example, a 7 nanosecond cycle time, is that, even at minimum expected latencies on a PCI bus, the processor is facing several hundred to several thousand cycles of time delay.
To optimize central processor utilization, conventional systems typically attempt to minimize the number of processor instruction cycles required to recognize the completion event and communicate this event to the I/O device driver. To optimize I/O device throughput, conventional systems also minimize the time between the completion of one I/O command and the start of the next I/O command. To optimize overall system performance, in relation to programs that require I/O, conventional systems also minimize the latency of an I/O operation, as measured from the time that the command is created until the time that the response has been interpreted and the results are available to the program that caused or required the I/O (such as, for example, an "OPEN FILE" function that requires a disk read operation to get information about the location of the requested file).
To accomplish these objectives, conventional I/O device protocols also employ both command and response queues located in the computer main memory, I/O adapter memory or registers, or a combination of both. Command queues enable the device driver to create new commands while the I/O adapter executes one such command. Response queues enable the device adapter to signal the completion of previous commands and proceed to new commands without waiting for the device driver or operating system to recognize and interpret the completion of these previous commands.
Similarly, computer systems generally use a processor interrupt mechanism which the I/O adapter uses to signal completion of a command and notify the processor that a response message has been placed on the response queue. The interrupt mechanism provides a signal line from the I/O adapter to the processor that, when asserted, asynchronously interrupts the central processor and switches processor execution from its current program to an operating system or device driver program designed to interpret the interrupt event.
While this interrupt mechanism can help optimize the latency associated with the completion of an I/O command and interpretation of the response message, switching the processor execution from its current program to an interrupt program requires a processor "context switch" that is an extremely time-expensive process requiring many instruction cycles. This context switch saves the current program's critical information such as selected processor registers and state information and loads the interrupt program's critical information.
When the device driver or operating system interrupt program completes its immediate work and is ready for the processor to resume the interrupted program, there is a second context switch to restore the critical information of the interrupted program which allows the processor to resume the interrupted program.
Each context switch is expensive because it consumes valuable processor time. It is important to note that a context switch has an associated latency that is at least an order of magnitude greater than the latency associated with a PCI load. Because conventional systems interrupt the processor every time an I/O event has completed, context switches are relatively frequent and greatly contribute to processor inefficiency.
There are two fundamental trends in computer science and the industry today that exacerbate the problems in conventional I/O systems. The first trend is that processors are getting very fast very quickly at a rate that greatly eclipses the pace at which I/O bus latencies are decreasing. As a trend, what is occurring in the industry is that the latency mismatch between PCI buses and high performance processors is going to grow rather than stay the same or decrease. This will degrade the relative performance of processors using conventional I/O systems such as the PCI standard because of the resulting frequent and time-expensive interrupts.
The second trend in computer science which magnifies the disadvantages of conventional I/O systems is network computing where a large number of relatively slow and inexpensive computers are connected to a shared, high-speed I/O device such as a mass storage device or communication link via an I/O processor. In such systems, the bottleneck often occurs at the shared I/O processor which must handle frequent interrupts from the networked computers.