This invention relates to methods and apparatus for interconnecting a PCI bus structure to a bus structure involving message passing and work queues.
In conventional computer systems, various components, such as CPUs, memory and peripheral devices, are interconnected by a common signal transfer path called a xe2x80x9cbusxe2x80x9d. Busses are implemented in a variety of well-known standard architectures, one of which is called the PCI (Peripheral Component Interconnect) architecture. In its basic configuration, a PCI bus has a bus width of 32 or 64 bits, operating clock speeds of 33 or 66 MHz, and a maximum data transfer speed of 132 MBps for 33 MHz operation and 566 MBps for 66 MHz operation. In accordance with PCI protocol, address and data are multiplexed so that address lines and data lines do not have to be separated. This multiplexing reduces both the number of signals required for operation and the number of connection pins required to connect PCI compatible devices to the bus. In the larger bus capability, there are 64 bus lines and, thus, 64 bits available for both address and data. PCI devices use a paged memory access scheme where each PCI address consists of a page number field and a page offset field and each PCI device can directly access a 4GB address space.
PCI bus technology uses memory mapped techniques for performing I/O operations and DMA operations. In accordance with this technique, within the physical I/O address space of the platform, a range of addresses called a PCI memory address space is allocated for PCI devices. Within this address space there is a region reserved by the operating system for programmable I/O (PIO) operations that are performed by the host to read or change the contents of the device registers in the associated PCI devices. The host performs the read and write operations in the kernel virtual address space that is mapped into the host physical address space. Within the region, separate addresses are assigned to each register in each PCI device. Load and store operations can then be performed to these addresses to change or read the register contents.
A separate region is also allocated by the operating system for DMA access to host memory by the PCI devices. The allocated addresses are dynamically mapped to a section of the host physical memory. During this mapping, an address translation is performed to translate the addresses generated by the PCI devices into addresses in the host physical memory that may have a different address size that the PCI addresses. This address mapping is accomplished via a number of conventional mechanisms including translation lookaside buffers and memory management units.
The PCI device then uses the mapped addresses to perform DMA operations by directly reading and writing in with the mapped addresses in the PCI address space. The host may also access these memory locations by means of the kernel virtual address space that is mapped by another memory management unit into the host physical memory. Details of the structure of the PCI bus architecture and of its operation are described in xe2x80x9cPCI Local Bus Specification, Revision 2.2xe2x80x9d (Copyright 1998) which publication is incorporated by reference herein in its entirety.
In addition to the PCI bus architecture, there are also other well-known bus architectures. For example, other architectures include Fibre Channel and more recently, InfiniBandSM architecture. These architectures are not memory-mapped architectures. Instead, the host and its memory are connected to host channel adapters. The input/output (I/O) devices are connected to target channel adapters. The host and target channel adapters communicate by messages comprising one or more data packets transmitted over serial point-to-point links established via a hardware switch fabric to which the host and target channel adapters are connected. The messages are enqueued for delivery between the channel adapters.
Data packet transmission is controlled by instructions generated by the host and I/O devices and placed in queues called work queues. Each work queue pair includes a send queue and a receive queue. The send queue can receive instructions from one process and the instructions cause data to be sent to another process. The receive queue can receive instructions which specify to a process where to place data received from another process. Hardware in the respective channel adapter processes instructions in the work queues and, under control of the instructions, causes the data packets to be transferred between the CPU memory and the I/O devices. A form of direct memory access (DMA) called remote direct memory access (RDMA) can also be performed by instructions placed in the work queues. This architecture has the advantage that it decouples the CPU memory from the I/O system and permits the system to be easily scaled.
As attractive as the newer bus architectures are, there are many existing PCI peripherals that will require accommodation in such architectures for a considerable period of time. Therefore, there exists a need for a mechanism to interconnect a PCI bus to the message-passing, queue-oriented architectures described above so that PCI peripherals can be used with the newer architecture. Such a mechanism is called a bridge and must meet certain criteria, such as the preservation of PCI ordering rules and address translation. In addition, PCI services must be implemented. For example, there must be a DMA mapping mechanism that allows the PCI devices to perform DMA operations. In addition, the aforementioned load/store operations must be accommodated. Other criteria, such as interrupt support must also be provided. It is also desirable to maximize the information transfer rate through such a bridge. However, the packetized data and instruction queues of the message-passing, queue-oriented architecture are not directly adaptable to meet the PCI memory mapped addressing requirements.
Therefore, there is a need to accommodate PCI peripherals in a computer system that uses a message-passing bus architecture and to perform the address mapping and translation that would conventionally be performed by an I/O memory management unit.
In accordance with one aspect of the invention, PCI load/store operations and DMA operations are implemented via work queue pairs. PCI address space is divided into segments and, each segment, in turn, is divided into regions. A separate work queue is assigned to each segment. A first portion of a PCI address is matched against the address ranges represented by the segments and used to select a memory segment and its corresponding work queue. An entry in the work queue holds a second portion of the PCI address which specifies a region within the selected segment that is assigned to a specific PCI device. The work queue entry may also hold other information such as the size of the data to be transferred and a pointer which identifies where the data packets containing the actual data to be transferred are located.
In one embodiment, PIO load/store operations are implemented by selecting a work queue assigned to PIO operations and creating a work queue entry with the PCI address of a register on a PCI device and a pointer to the PIO data. The work queue entry is sent to the bridge where the PCI address is extracted and used to program the appropriate device register with the data using the data pointer.
In another embodiment, PIO load/store operations are implemented by selecting a work queue assigned to PIO operations and creating a work queue entry with the PCI address of a register on a PCI device and a pointer to the PIO data. An RDMA operation is used to perform the PIO load/store processes. The page and region data is used in connection with a translation protection table in the host channel adapter to access physical memory and perform the PIO operation.
In yet another embodiment, DMA transfers are implemented by selecting a work queue by means comparing a portion of the PCI address generated by the PCI device to an address range table and selecting a work queue that services the address range. A work queue entry is created with the remainder of the PCI address and a pointer to the DMA data. An RDMA operation is used to perform the DMA transfer. The page and region data is used in connection with a translation protection table in the host channel adapter to access physical memory and perform the DMA transfer.