This invention relates to methods and apparatus for servicing interrupts generated in a PCI bus structure with a message passing, queue-oriented bus system.
In conventional computer systems, various components, such as CPUs, memory and peripheral devices, are interconnected by a common signal transfer path called a xe2x80x9cbusxe2x80x9d. Busses are implemented in a variety of well-known standard architectures, one of which is called the PCI (Peripheral Component Interconnect) architecture. In its basic configuration, a PCI bus has a bus width of 32 or 64 bits, operating clock speeds of 33 or 66 MHz, and a maximum data transfer speed of 132 MBps for 33 MHz operation and 566 MBps for 66 MHz operation. In accordance with PCI protocol, address and data are multiplexed so that address lines and data lines do not have to be separated. This multiplexing reduces both the number of signals required for operation and the number of connection pins required to connect PCI compatible devices to the bus. In the larger bus capability, there are 64 bus lines and, thus, 64 bits available for both address and data. PCI devices use a paged memory access scheme where each PCI address consists of a page number field and a page offset field and each PCI device can directly access a 4GB address space.
PCI bus technology uses memory mapped techniques for performing I/O operations and DMA operations. In accordance with this technique, within the physical I/O address space of the platform, a range of addresses called a PCI memory address space is allocated for PCI devices. Within this address space there is a region reserved by the operating system for programmable I/O (PIO) operations that are performed by the host to read or change the contents of the device registers in the associated PCI devices. The host performs the read and write operations in the kernel virtual address space that is mapped into the host physical address space. Within the region, separate addresses are assigned to each register in each PCI device. Load and store operations can then be performed to these addresses to change or read the register contents.
A separate region is also allocated by the operating system for DMA access to host memory by the PCI devices. The allocated addresses are dynamically mapped to a section of the host physical memory. During this mapping, an address translation is performed to translate the addresses generated by the PCI devices into addresses in the host physical memory that may have a different address size that the PCI addresses. This address mapping is accomplished via a number of conventional mechanisms including translation lookaside buffers and memory management units.
The PCI device then uses the mapped addresses to perform DMA operations by directly reading and writing in with the mapped addresses in the PCI address space. The host may also access these memory locations by means of the kernel virtual address space that is mapped by another memory management unit into the host physical memory. Details of the structure of the PCI bus architecture and of its operation are described in xe2x80x9cPCI Local Bus Specification, Revision 2.2xe2x80x9d (Copyright 1998) which publication is incorporated by reference herein in its entirety.
In addition to the PCI bus architecture, there are also other well-known bus architectures. For example, other architectures include Fibre Channel and more recently, InfiniBandSM architecture. These architectures are not memory-mapped architectures. Instead, the host and its memory are connected to host channel adapters. The input/output (I/O) devices are connected to target channel adapters. The host and target channel adapters communicate by messages comprising one or more data packets transmitted over serial point-to-point links established via a hardware switch fabric to which the host and target channel adapters are connected. The messages are enqueued for delivery between the channel adapters.
Data packet transmission is controlled by instructions generated by the host and I/O devices and placed in queues called work queues. Each work queue pair includes a send queue and a receive queue. The send queue can receive instructions from one process and the instructions cause data to be sent to another process. The receive queue can receive instructions which specify to a process where to place data received from another process. Hardware in the respective channel adapter processes instructions in the work queues and, under control of the instructions, causes the data packets to be transferred between the CPU memory and the I/O devices. A form of direct memory access (DMA) called remote direct memory access (RDMA) can also be performed by instructions placed in the work queues. This architecture has the advantage that it decouples the CPU memory from the I/O system and permits the system to be easily scaled.
As attractive as the newer bus architectures are, many existing PCI peripherals will require accommodation in such architectures for a considerable period of time. Therefore, there exists a need for a mechanism to interconnect a PCI bus to the message-passing queue oriented architectures so that PCI peripherals can be used with the newer architecture. Such a mechanism is called a bridge and must meet certain criteria, such as preserving PCI ordering rules and address translation. In addition, PCI services must be implemented. For example, there must be a DMA mapping mechanism that allows the PCI devices to perform DMA operations. Since a DMA operation, once started, is autonomous, provision must be made for lengthy continuous data transfers. Interrupt support must also be provided so that PCI devices can generate interrupts which are serviced by device drivers. However, in a message passing system, a request to store data does not guarantee the data has been stored until an acknowledgement message has been received back from the bus system that indicates that the data has been stored. Therefore, it is possible for an interrupt to be generated after a request has been made to store data, but before an acknowledgment has been received. To make matters even more complicated, PCI devices must be able to generate interrupts during DMA operations. Since DMA operations can involve several data transfer requests in a message passing system and these data transfer requests are completed asynchronously, care must be taken that the interrupt is sent after all data requests outstanding at the time of the interrupt have been completed.
In addition, other criteria, such as the aforementioned load/store operations must be accommodated. However, the message passing and queue orientation of the message passing queue oriented architecture is not directly adaptable to meet the PCI requirements because the PCI devices cannot be directly coupled to the memory due to the decoupled nature of the message passing bus architecture.
Therefore, there is a need to accommodate PCI peripherals in a computer system that uses a message passing bus architecture and to service interrupts in a manner similar to the conventional PCI architecture.
In accordance with the principles of the invention, in order to service PCI interrupts, a separate interrupt work queue is assigned to each interrupt line for each PCI device. This interrupt work queue sends interrupt vector packets from the PCI bridge to the host in order to allow the host to service the interrupts. To prevent an interrupt from being transmitted before DMA data writes generated by the same device that generated the interrupt have been completed, interrupt requests are held on the interrupt work queue until all DMA writes outstanding at the time of the interrupt have been acknowledged.
Synchronization between DMA writes and interrupts is accomplished by creating a special data structure called an interrupt scoreboard for each interrupt work queue entry associated with a DMA write. When an interrupt is received, an interrupt data packet is entered into the interrupt work queue, but is held. The interrupt scoreboard then stores a xe2x80x9csnapshotxe2x80x9d of the state of the pending data requests at the time the interrupt is generated. The interrupt scoreboard is then used to track the pending DMA writes. When acknowledgement messages have been received for all pending DMA writes, then the interrupt data packet is transmitted so that the interrupt can be serviced.