1. Technical Field of the Invention
The field of the invention relates to messages coming into and out of a computer processor complex, referred to as I/O (input/output) messages. More particularly, this invention relates to mechanisms and techniques which interrupt the processor complex so that it can retrieve an I/O message.
2. Description of Related Art
Computer input/output (I/O) protocols govern communications between computer operating system programs and I/O adapters, such as I/O adapters that provide disk storage, communications or network capabilities. Conventionally, these I/O protocols are based upon command and response messages that are exchanged via an I/O bus interconnecting the computer central processor and memory to I/O adapters or I/O processors. An I/O processor is a type of I/O adapter that is distinguished as having more complex functions, usually in support of operating system programs. An I/O adapter will be considered the broader class of which an I/O processor is included.
In such conventional I/O protocols, operating system device driver programs create command messages which are then transmitted across an I/O bus to the I/O adapter. The adapter interprets the command and performs the requested operation. Usually, this operation includes the transfer of data between an I/O adapter connected to the I/O adapter and the computer memory across the I/O bus. Such data are typically transferred using known direct memory access mechanisms that are part of the I/O bus functions. When the I/O adapter has completed the requested operation, it creates a response message that is transmitted back to the computer memory where the operating system and device driver programs interprets that response and concludes the overall I/O operation.
Conventional PCI (Peripheral Component Interconnect) bus architectures include a host system with a central processor complex and a main memory connected to a plurality of I/O adapters via a PCI bus. In the general device model of conventional PCI buses, the conventional PCI bus architecture does not make any assumptions about the content or type of information exchanged between a host system and an I/O adapter. That is to say the PCI architecture does not define or distinguish the specific communications that occur between the host system and the I/O adapter that use the PCI bus as a transmission medium.
In the PCI specification model, an I/O adapter typically includes a set of memory locations that might collectively be called a register set or a command buffer and a response buffer. In this PCI specification model, these I/O adapters memory locations are seen by the host's central processor as additional memory locations in its own memory space, that is, host system software "maps" these PCI I/O adapter memory locations into the totality of the host system memory regions that are accessible using processor memory load and store operations. Thus, the typical host central processor performs memory store operations to PCI I/O adapter memory locations to transmit a command on the PCI bus to a common buffer and performs memory load operations from I/O adapter memory to retrieve a response of status information on the PCI bus from the I/O adapter. Unlike processor store or load operations directed to actual host system memory, the processor store or load operations to PCI I/O adapter memory locations are usually require more time and are considered very time-expensive with respect to the host central processor.
In response to the command, the I/O adapter will perform the requested operation and then generate a response message to inform the host system of the result and any errors that have occurred. This response message is typically stored in the I/O adapter's response message buffer The host system must then retrieve the response message and extract protocol information from the retrieved response message to determine the I/O adapter's response to the command. More particularly, the PCI host system reads the response message from an address in a memory of the I/O adapter to retrieve the response message. One consequence of such a conventional PCI system is that the host system processor experiences latency because it must store the command to the I/O adapter memory and then load response data from the I/O adapter memory.
Specifically, each particular vendor of an I/O adapter and each particular type of device built by a particular vendor includes specific definitions of the command buffers and response buffers. These definitions specify the semantics of the buffers, how many of them there are, their size, and at what memory addresses they are located. Because each definition is very specific not only to a particular device vendor but also to a particular device built by that vendor, there is no generalized model. One disadvantageous consequence is that the host system software must take into account the unique characteristic of each vendor and I/O adapter in order to communicate I/O commands and responses between the host system and the I/O adapter using the PCI memory read and write operations.
Furthermore, the execution of I/O commands by an I/O adapter typically requires a time duration that is many thousands, or even millions, of central processor instruction cycles. Thus, while the I/O adapter is performing a command, the device driver and computer operating system normally perform other work and are not dedicated strictly to waiting for the I/O adapter to complete the command and forward the response message. Rather, the typical device driver and operating system rely upon an asynchronous event indication, such as a processor interrupt, to signal that the I/O adapter has completed the command and that the response message is available for the operating system and device driver to interpret.
The relative timing and frequency of the signals to interrupt the processor have significant effects on the overall utilization of the central processor, utilization of the I/O adapter and its data throughput capabilities, and overall system performance. Such utilization is also affected by I/O command latency, or the duration of an I/O operation as seen by the programs that depend upon that I/O operation to complete their functions. In a large high performance processor system, the latency for an I/O memory read across a conventional PCI bus may require many, many processor cycles which seriously degrades execution speed of a program depending upon that I/O memory read. More particularly, a high performance processor attempting to do a single memory read of one word (4 bytes) of data from a PCI device may experience a latency to complete that memory read of several hundred or even several thousand processor cycles.
The PCI local bus specification utilizes a mechanism that potentially alleviates some of these inefficiencies resulting from I/O latencies. This mechanism sets target latencies which limit the time in which the master, e.g., host system, the bus arbitrator and the target, e.g., I/O adapter, must wait for responses. In practice, the PCI bus has a minimum latency based on its cycle time which is currently on the order of 33 to 66 MHz, so there are still guaranteed minimum latencies of several microseconds. Furthermore, the maximum, target latencies that the PCI standard would expect are typically on the order of many to several hundred microseconds. Potentially, for a slow I/O adapter that maximum latency could even realistically be upwards of a millisecond or even several milliseconds. The consequence to a high performance processor running with, for example, a seven nanosecond cycle time, is that, even at minimum expected latencies on a PCI bus, the processor is facing several hundred to several thousand cycles of time delay.
To optimize central processor utilization, conventional systems typically attempt to minimize the number of processor instruction cycles required to recognize the completion event and communicate this event to the I/O adapter device driver. To optimize I/O adapter throughput, conventional systems also attempt to minimize the time between the completion of one I/O command and the start of the next I/O command. To optimize overall system performance, in relation to programs that require I/O, conventional systems minimize the latency of an I/O operation, measured from the time the command is created until the time the response has been interpreted and the results are available to the program that caused or required the I/O, such as, for example, an "OPEN FILE" function that requires a disk read operation to get information about the location of the requested file.
To accomplish these objectives, conventional I/O protocols also employ both command and response queues located in the computer main memory, I/O adapter memory or registers, or a combination of both. Command queues enable the device driver to create new commands while the I/O adapter executes one such command. Response queues enable the I/O adapter to signal the completion of previous commands and proceed to new commands without waiting for the device driver or operating system to recognize and interpret the completion of these previous commands.
Similarly, computer systems generally include a processor interrupt mechanism which the I/O adapter uses to signal completion of a command and notify the processor that a response message has been placed on the response queue. The interrupt mechanism provides a signal line from the I/O adapter to the processor that, when asserted, asynchronously interrupts the central processor and switches processor execution from its current program to an operating system or device driver program designed to interpret the interrupt event. While this interrupt mechanism can help optimize the latency associated with the completion of an I/O command and interpretation of the response message, switching the processor execution from its current program to an interrupt program requires a processor context switch that requires many instruction cycles.
A context switch saves the current program's critical information such as selected processor registers and state information and loads the interrupt program's critical information. When the interrupt program completes its immediate work and is ready for the processor to resume the interrupted program, there is a second context switch to restore the critical information of the interrupted program which allows the processor to resume the interrupted program. Each context switch consumes valuable processor time. Because conventional systems interrupt the processor every time an I/O event has completed, context switches are relatively frequent and result in processor inefficiency.
Furthermore, in PCI buses on personal computers and desktop platforms, an I/O command is performed via a processor memory store which moves the command from the host system to a buffer in the I/O adapter. This I/O storing process includes the host system storing a command in a control register of the I/O adapter and loading from the host bridge hardware to complete all stores to the I/O adapter, and to verify that no errors occurred. The form of notification that an I/O command has been completed varies on the PCI adapter. A common form, however, is for the I/O adapter to raise a system interrupt line to the host system. In response to the interrupt, the host central processor performs a series of memory load operations from the PCI adapter to determine the nature of the interrupt.
Within the PCI specification model, an I/O adapter normally provides a singular PCI interconnection that encompasses or represents all of the internal elements of that I/O adapter. The singular PCI interconnection is, therefore, considered a "single function" PCI I/O adapter. The PCI specification, moreover, distinguishes a class of PCI I/O adapters as "multifunction adapters" which have a singular physical connection to a PCI bus but have two to eight independent PCI I/O adapters connected through that one common physical connection.
Each PCI bus physical connection provides a PCI "interrupt A" signal to the host system. For a PCI multifunction I/O adapter, all internally connected I/O adapter functions must share this same "interrupt A" signal to the host system. The host system then must interrogate all of the I/O adapter functions within the multifunction I/O adapter to determine which of these functions is signalling the interrupt. To enable host systems to reduce this expensive interrupt processing, the PCI specification model includes three additional PCI interrupt signals--"interrupt B", "interrupt C", and "interrupt D"--that may be implemented. These additional interrupt signals allow individual I/O adapter functions within the multifunction I/O adapter to uniquely signal an interrupt provided that no two I/O adapter functions share the same interrupt signal (A, B, C, or D). Of course, any multifunction I/O adapter that provides more than four internal I/O adapter functions must share interrupt signals between at least two I/O adapter functions resulting in a corresponding increase in host system expense to process those shared interrupt signals.
While providing four different signals provides an architectural solution to reducing host system interrupt processing expense for a multifunction I/O adapter, it is not a practical solution for many host systems. Most host system PCI buses seek to increase the physical connections and possible I/O devices to the PCI bus to insure higher utilization of the PCI bus while minimizing the cost of these connections. Additional interrupt signals increase the number of input pins required of the PCI host bridge hardware but it is a practical objective of most PCI host bridge hardware implementations to minimize the number of input/output pins. It is also impractical to provide many interrupt signals from every connection on the PCI bus. Thus, in practice many host systems limit the number of PCI bus connections which can either provide more than an "interrupt A" signal or in which all or some subset of interrupt signals (A, B, C, and D) are connected to a single "interrupt A" input to the host system. Still, multifunction I/O adapters require increased host processor expense to interrogate individual I/O adapter functions to determine the source(s) of a PCI interrupt from the physical connection.
The PCI specification model also distinguishes PCI-to-PCI bridge devices. PCI-to-PCI bridge devices create a secondary PCI bus from a primary PCI bus. Typically the primary PCI bus is the PCI bus most closely connected to the host system. The PCI-to-PCI bridge devices are connected to this primary PCI bus to provide connections for other PCI I/O adapters. To the I/O adapters connected to it, the bridge device appears as if it were the host system PCI host bridge and more specifically, the bridge device receives the I/O adapter interrupt signals from the PCI bus connections on the secondary PCI bus. Upon receiving one or more interrupt signals from the connections on its secondary bus, the PCI-to-PCI bridge signals "interrupt A" on its connection to its primary PCI bus effectively forwarding the collection of presently active second bus PCI interrupts (A, B, C, and D) to the host system via its own "interrupt A" signal. As with multifunction I/O adapters, the host system receives only "interrupt A" from the PCI-to-PCI bridge and must interrogate the bridge to determine which of the interrupt signals from the secondary bus is the source of the "interrupt A" signal. Thus, the host system experiences an increased expense to process an interrupt from an I/O adapter connected through a PCI-to-PCI bridge.
Memory load operations to retrieve data from an I/O adapter or PCI bus hardware require many central processor cycles to retrieve the data because the central processor waits for the loading operation to complete. Memory store operations which store commands from the central processor to I/O adapters and PCI bus hardware are not initially expensive in terms of central processor cycles, but the store command may not complete immediately and must either be verified via a load operation from the same PCI memory location or a series of processor load operations to verify the hardware between the central system processor and the I/O adapter. Memory store operations that require verification are commonly referred to as "verified" store operations. Memory store operations that do not require verification and that may be re-issued without adverse system effects are referred to as "non-verified" store operations. Thus, to optimize the overall system performance and minimize processor utilization, it is desirable to avoid expensive loads from I/O adapters and also expensive "verified" stores to I/O adapters.
The normal interrupt mechanism for a PCI device is to send a signal on a particular line to the central processor after having first created an interrupt status vector in an internal facility or register located within the device's memory space. The central processor, upon receiving the interrupt, determines which I/O adapter signalled the interrupt and the central processor actually looks at all of the I/O adapters that could have potentially signalled such an interrupt. Sometimes, however, the central processor knows from the PCI bus connection topology that the interrupt came from a particular I/O adapter and so it looks only at that I/O adapter. In response to such an interrupt, the central processor loads from the memory address in the device specific to that vendor and device type to extract the interrupt status vector that describes the particular reason for which the device raised the interrupt.
If there are multiple I/O adapters sharing a common interrupt line, the conventional host system must read each I/O adapter's interrupt status register in each I/O adapter's memory to determine which I/O adapter presented the interrupt. The result is a rather lengthy process in which the system loads from each I/O adapter to see which one interrupted the system. Because each load form a PCI device is expensive, if there are a large number of I/O adapters connected to a single interrupt line, the result is a large number of memory loads to determine which I/O adapter interrupted and the cause of the interrupt. This interrogation reduces host central processor utilization and overall host system performance.