1. Technical Field of the Invention
This invention relates generally to computer systems and, more particularly, the invention relates to I/O (input/output) message transport mechanisms and methods.
2. Description of Related Art
Computer input/output (I/O) protocols govern communications between computer operating system programs and computer I/O devices, such as I/O devices that provide disk storage, communications or network capabilities. Conventionally, these I/O protocols are based upon command and response messages that are exchanged on an I/O bus that interconnects the computer central processor and memory to I/O device adapters or I/O processors. An I/O processor is a type of an I/O adapter that has more complex functions, usually in support of the operating system programs. An I/O adapter will be considered the broader class of which an I/O processor is included.
In conventional I/O protocols, operating system device driver programs create command messages that are transmitted across an I/O bus to the I/O adapter. The I/O adapter interprets the command and performs the requested operation, usually transferring data between an I/O device connected to the I/O adapter and the main memory across the I/O bus. Data is typically transferred by using known direct memory access mechanisms that are part of the I/O bus functions. When the I/O adapter completes the requested operation, it creates a response message that is retrieved by the central processor to the main memory where the operating system and device driver programs executed by the central processor interpret that response and conclude the overall I/O operation.
Conventional PCI (Peripheral Component Interconnect) bus architectures include a host system with a central processor complex and a main memory connected to a plurality of I/O adapters via a PCI bus. In the general model of conventional PCI buses, the conventional PCI bus architecture does not make any assumptions about the content or type of information exchanged between a host system and an I/O adapter. That is to say the PCI architecture does not define or distinguish the specific communications that occur between the host system and the I/O adapter that use the PCI bus as a transmission medium.
In the PCI specification model, an I/O adapter typically includes a set of memory locations that might collectively be called a register set or a command buffer and a response buffer. The typical PCI host system performs time-expensive memory storing operations to transmit a command via the PCI bus from the host system to a memory location such as the command buffer of an I/O adapter.
In response to the command, the I/O adapter performs the requested operation and then generate a response message to inform the host system of the result and any errors that have occurred. This response message is typically stored in the I/O adapter's response message buffer. The host system must then retrieve the response message and extract protocol information from the retrieved response message to determine the I/O adapter's response to the command. More particularly, the PCI host system reads the response message from an address in a memory of the I/O adapter to retrieve the response message.
One consequence of such a conventional PCI system is that the host system processor experiences latency to store to and load from the PCI I/O adapter memory. The execution of I/O commands by an I/O adapter typically requires a time duration of many thousands, or even millions, of central processor instruction cycles. Thus, while the I/O adapter is performing a command, the device driver and computer operating system normally perform other work and are not waiting for the I/O adapter to complete the command and forward the response message. The device driver and operating system rely upon an asynchronous event indication, such as a processor interrupt, to signal that the I/O adapter has completed the command and that the response message is available for the operating system and device driver to interpret. The relative timing and frequency of these signals, called interrupts, indicating that an I/O command has been completed have significant effects on the overall utilization of the central processor, utilization of the I/O adapter and its data throughput capabilities, and overall system performance. For example, in a large high performance processor system, the latency for an I/O memory read across a conventional PCI bus requiring many processor cycles seriously degrades execution speed of a program depending upon the data read from memory. More particularly, a high performance processor may be latent several hundred or even several thousand processor cycles while one word, i.e., four bytes of data, is read from memory.
The PCI local bus specification potentially alleviates some of these inefficiencies resulting from I/O latencies by setting maximum target latencies which are not to be exceeded by the host system, the bus arbitrator, or the I/O adapter. In practice, however, the PCI bus has a minimum latency on the order of 33 to 66 MHz so there are still minimum latencies of several microseconds. Furthermore, the maximum target latencies that the PCI standard expects are typically on the order of many to several hundred microseconds. Realistically, for a slow I/O adapter the maximum latency could even be one or more milliseconds. The penalty to a high performance processor running with, for example, a seven nanosecond cycle time, is that, even at minimum expected latencies on a PCI bus, the processor is facing several hundred to several thousand cycles of time delay.
To optimize central processor utilization, conventional systems typically attempt to minimize the number of processor instruction cycles required to recognize the completion event and communicate this event to the I/O device driver. To optimize I/O adapter throughput, conventional systems also attempt to minimize the time between the completion of one I/O command and the start of the next I/O command. To optimize overall system performance, in relation to programs that require I/O, conventional systems also minimize the latency of an I/O operation, as measured from the time that the command is created until the time that the response has been interpreted and the results are available to the program that caused or required the I/O, for example, an "OPEN FILE" function that requires a disk read operation to get information about the location of the requested file.
Conventional I/O adapter protocols also employ both command and response queues located in the computer main memory, I/O adapter memory or registers, or a combination of both. Command queues enable the device driver to create new commands while the I/O adapter executes one such command. Response queues enable the I/O adapter to signal the completion of previous commands and proceed to new commands without waiting for the device driver or operating system to recognize and interpret the completion of these previous commands.
Similarly, computer systems generally include a processor interrupt mechanism which the I/O adapter uses to signal completion of a command and notify the processor that a response message has been placed on the response queue. The interrupt mechanism provides a signal line from the I/O adapter to the processor that, when asserted, asynchronously interrupts the central processor and switches processor execution from its current program to an operating system or device driver program designed to interpret the interrupt event. While this interrupt mechanism can minimize the latency associated with the completion of an I/O command and interpretation of the response message, switching the processor execution from its current program to an interrupt program requires a processor context switch that requires many instruction cycles. This context switch saves the current program's critical information such as selected processor registers and state information and loads the interrupt program's critical information. When the interrupt program completes its immediate work and is ready for the processor to resume the original program, there is a second context switch to restore the critical information of the original program to resume execution. Each context switch consumes valuable processor time. Conventional systems interrupt the processor every time an I/O event has completed, so context switches cause processor inefficiency.
Furthermore, in PCI buses on personal computers and desktop platforms, an I/O command is performed via a memory mapped I/O storing process which moves the command from the host system to a buffer in the I/O adapter. This I/O storing process includes the host system storing a command in a control register of the I/O adapter and loading from the I/O hardware to complete all stores to the I/O adapter and to verify that no errors occurred. The I/O adapter moves the data between the I/O adapter and the PCI device adapter memory, and utilizes a direct memory access to transfer the data between PCI adapter memory and the host system memory. The form of notification that a command has been completed varies on the PCI adapter. A common form is for the I/O adapter to raise a system interrupt line to the host system. In response to the interrupt, the host system software performs a series of loading operations from the PCI adapter to determine the nature of the interrupt.
Loading operations in which data is retrieved from an 1/O adapter or PCI bus hardware require many central processor cycles to retrieve the data because the central processor waits for the loading operation to complete. Storing operations which store commands from the central processor to I/O adapters and PCI bus hardware are not initially expensive in terms of central processor cycles, but the store command may not complete and must either be verified via a loading operation from the same location or a series of loading operations to verify the hardware between the central system processor and the I/O adapter. Storing operations that require verification are commonly referred to as "verified" storing operations. Storing operations that do not require verification and that may be reissued without adverse system effects are referred to as "non-verified" storing operations. Thus, to optimize the overall system performance and minimize processor utilization, it is desirable to avoid expensive loads from I/O adapters and also expensive "verified" stores to I/O adapters.
The normal interrupt mechanism for a PCI adapter is to first create interrupt status in an internal facility or register located within the adapter's memory space. Then the PCI device signals an interrupt to the system. The system, upon receiving the interrupt, first determines which adapter had signalled that interrupt. To make this determination, the system actually looks at all of the I/O adapters that could have potentially signalled such an interrupt unless it knows that the interrupt came from a particular slot and need only look at that one I/O adapter. In response to such an interrupt, the system then reads the memory address in the device specific to that vendor and adapter type. By reading the memory, the system extracts the interrupt status that describes the particular reason for which the device raised the interrupt.