The present invention relates in general to methods improving the performance of hardware units used to bridge from one communication standard protocol to devices with a second communication protocol.
Peripheral Component Interconnect (PCI) is a peripheral bus commonly used in personal computers (PCs), Apple Macintoshes, and workstations. It was designed primarily by Intel and first appeared on PCs in late 1993. PCI provides a high-speed data path between the central processing unit (CPU) and peripheral devices (e.g., video, disk, networks, etc.). There are typically three or four PCI slots on the motherboard. In a Pentium PC, there is generally a mix of PCI and Industrial Standard Architecture (ISA) slots or PCI and Extended ISA (EISA) slots. Early on, the PCI bus was known as a xe2x80x9clocal bus.xe2x80x9d
PCI runs at 33 MHz, supports 32 and 64-bit data paths and bus mastering. PCI Version 2.1 calls for 66 MHz, which doubles the throughput. There are generally no more than three or four PCI slots on the motherboard, which is based on 10 electrical loads that deal with inductance and capacitance. The PCI chipset uses three loads, leaving seven for peripherals. Controllers built onto the motherboard use one, whereas controllers that plug into an expansion slot use 1.5 loads. A xe2x80x9cPCI bridgexe2x80x9d may be used to connect two PCI buses together for more expansion slots.
PCI Extended (PCI-X) is an enhanced PCI bus from IBM, HP and Compaq that is backward compatible with existing PCI cards. It uses a 64-bit bus with a clock speed as high as 133 MHz, providing a large jump in speed from the original PCI bus at 132 MBytes/sec to as much as 1 GBytes/sec.
InfiniBand (IB) is an input/output architecture that is expected to replace the PCI-X bus in high-end servers. Supporting both copper wire and optical fibers and originally known as xe2x80x9cSystem I/O,xe2x80x9d IB is a combination of Intel""s Next Generation I/O (NGIO) and Future I/O from IBM, HP and Compaq. Unlike the PCI""s bus technology, IB is a point-to-point switching architecture, providing a data path of from 500 MBps to 6 GBps between each pair of nodes at distances orders of magnitude greater than PCI or PCI-X allow.
One of the interesting applications being explored is to xe2x80x9clengthenxe2x80x9d and expand the PCI bus by adding a PCI to IB bridge to the Host system and connecting it to an expansion drawer via an IB link (as described in U.S. Pat. No. 6,003,105). In this model, the PCI to IB bridge monitors areas of the PC""s PCI bus, translates the PCI commands to equivalent IB commands, and forwards transactions to a remote expansion unit (drawer). An IB to PCI bridge in the expansion drawer receives the transactions over the IB link, converts them back to their equivalent PCI commands, and reissues them on the PCI bus in the expansion drawer. Similar results may be achieved by adding a standard IB Host communication adapter (HCA) to the Host system (instead of the PCI to IB bridge) and writing a device driver that uses IB APIs to send IB commands to the IB to PCI bridge located in the expansion drawer which the bridge translates to equivalent PCI commands and issues to the PCI device. Another solution may add or modify Host software to monitor calls to the operating system PCI API and generate the equivalent IB commands, which are again sent to the IB to PCI bridge in the expansion drawer.
This model of translating and forwarding IB commands to the expansion drawer has drawbacks since the IB and PCI semantics do not exactly match and performance may suffer. Performing an IB command may also have much more latency than its equivalent PCI command if the Host and I/O devices are located great distances apart. For example, storage adapters often use a model similar to the following:
1. The device driver allocates a command block from its internal pool of blocks, initializes the block, and then does a single 32 bit PCI write to an adapter register with the physical address of the block. Typically the driver always writes the address to the same device register.
2. The device hardware usually puts the data from the write operation into a queue and then interrupts the firmware on the adapter. The firmware pulls the address from the queue and then programs a direct memory access (DMA) logic engine to copy the command block referenced by the address from the Host""s memory to the adapter""s memory. The block of data is usually a fixed size.
3. The adapter analyses the command and determines if more data is required (such as writing to a disk) and if more data is required the adapter programs its DMA engine to move the rest of the data from the Host at the address(es) provided in the command block in the adapter""s memory.
4. The adapter executes the command.
5. If there is result data for the Host (i.e., a disk read), the adapter uses a DMA to send the data back to the Host at address(es) provided in the command block.
6. The adapter then interrupts the Host.
7. Finally, the Host""s device driver reads the interrupt status register on the adapter, recognizes that the adapter has issued the interrupt, and then reads another hardware register which retrieves the first element in the status queue and completes the original Host I/O request.
If a direct translation is done from the preceding PCI commands to IB commands (either by software in the Host or by a PCI to IB bridge) the resulting sequence of IB commands may look like the following:
1. (Driver writes command address) Host to expansion drawer: Remote direct memory access (RDMA) 32-bits to a fixed address (the device""s command register). RDMA is an IB specific command.
2. (PCI adapter starts fetching command block) Expansion drawer to Host: RDMA fixed sized block (of data) from the Host address provided in step 1 of this sequence.
3. (PCI adapter starts fetching data if more data is required) Expansion drawer to Host: RDMA a variable sized block from Host address(es) provided in the command block.
4. (PCI adapter starts sending data if there is result data for the Host) Expansion drawer to Host: RDMA variable sized block to variable Host address(es) provided in the command block.
5. (PCI adapter raises interrupt) Expansion drawer to Host: Send (with a SEND command) a small packet that tells the Host system that a PCI interrupt has been raised. SEND is an IB specific command. SEND is issued because there is no direct equivalent to a PCI interrupt in the IB specification.
6. (Driver reads the interrupt status register) Host to expansion drawer: RDMA 32-bits from a fixed address (the device""s interrupt status register).
7. (Driver reads the status queue) Host to expansion drawer: RDMA 32-bits from a fixed address (the devices status queue).
The problem with this translation is that the seven round trips required may be slow when run over one of the reliable IB protocols. There is, therefore, a need for a method to improve the communication performance between PCI protocol units and IB protocol units when bridge units are available and to improve performance when bridge units are not available.
Device drivers, using the Peripheral Component Interconnect (PCI) protocol and designed to communicate with PCI I/O devices over the PCI local bus, are incorporated with PCI to InfiniBand (IB) and IB to PCI bridge units. An expansion drawer incorporates an IB to PCI bridge unit to communicate with PCI I/O adapter units in a local bus configuration. The expansion drawer communicates with the PCI to IB bridge unit in the Host system over an IB link. In one embodiment of the present invention, hardware is added to the bridge units to monitor the PCI commands issued on the local bus of the Host system. The hardware learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated and stored. If optimized PCI command sequences have been generated for the PCI transaction, then the optimized sequences are substituted and sent to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge unit. In another embodiment, a state machine is added to the bridge units that is designed to issue optimized PCI command sequences in response to a PCI transaction request. The state machine may be referenced to a PCI I/O device using a PCI I/O device identification value supplied by the manufacturer.
In another embodiment, where a PCI to IB bridge unit is not in the Host system, software is added to the OS to monitor the PCI commands issued by the device drivers through the OS""s PCI APIs. The software learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated for the PCI transaction. The software then substitutes the optimized sequences and sends them, via the Host communication adapter (HCA), to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge. In another embodiment, a state machine is added to the OS that is designed to issue optimized PCI commands in response to the PCI transaction request. The state machine may be referenced to a particular PCI I/O device using a PCI I/O device identification (ID) value supplied by the manufacture of the I/O device.
In another embodiment, where a PCI to IB bridge is not in the Host system, the OS""s PCI APIs are extended with new APIs that help to optimize the performance of the system. For example, a new API may be added with the express purpose of clearing selected bits in a byte. Software is also added to the OS to monitor the PCI commands issued by the device drivers through the OS""s APIs to optimize the PCI command sequences. The software learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated for the PCI transaction. The software then substitutes the optimized sequences and sends them, via the Host communication adapter (HCA), to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge. In another embodiment, a state machine is added to the OS that is designed to issue optimized PCI commands in response to the PCI transaction request. The state machine may be referenced to a particular PCI I/O device using a PCI I/O device identification (ID) value supplied by the manufacture of the I/O device.
In another embodiment of the present invention, where a PCI to IB bridge is not in the Host system, a layer of software is added between the OS and the device drivers to monitor the PCI commands issued by the device drivers. The software layer intercepts the PCI API calls that the device driver issues and either forwards them to the OS (if the calls reference a PCI device attached to a local PCI bus) or converts them to IB commands and sends them to the IB to PCI bridge via the IB HCA. The software learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated for the PCI transaction. The software then substitutes the optimized sequences and sends them, via the Host communication adapter (HCA), to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge. In another embodiment, a state machine is added to the OS that is designed to issue optimized PCI commands in response to the PCI transaction request. The state machine may be referenced to a particular PCI I/O device using a PCI I/O device identification (ID) value supplied by the manufacture of the I/O device.
In another embodiment where a PCI to IB bridge is not in the Host system, a library of software is compiled and/or linked with the device driver source code to produce a new device driver object module. The library of software intercepts the PCI API calls that the device driver issues and either forwards them to the OS (if the calls reference a PCI device attached to a local PCI bus) or converts them to IB commands and sends them to the IB to PCI bridge via the IB HCA. The software learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated for the PCI transaction. The software then substitutes the optimized sequences and sends them, via the Host communication adapter (HCA), to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge. In another embodiment, a state machine is added to the OS that is designed to issue optimized PCI commands in response to the PCI transaction request. The state machine may be referenced to a particular PCI I/O device using a PCI I/O device identification (ID) value supplied by the manufacture of the I/O device.
In another embodiment where a PCI to IB bridge is not in the Host system, a library of software is compiled and/or linked with the device driver source code to produce a new device driver object module. The library of software intercepts the PCI API calls that the device driver issues and provides new APIs that help to optimize the performance of the system. For example, a new API may be added with the express purpose of clearing selected bits in a byte. The library of software intercepts the PCI API calls that the device driver issues and either forwards them to the OS (if the calls reference a PCI device attached to a local PCI bus) or converts them to IB commands and sends them to the IB to PCI bridge via the IB HCA. The software learns the PCI command sequences for PCI I/O device transactions. These PCI command sequences are optimized for the PCI transactions and stored for subsequent use. When the device driver issues a request for a PCI transaction, the stored data is searched to determine if optimized sequences have been generated for the PCI transaction. The software then substitutes the optimized sequences and sends them, via the Host communication adapter (HCA), to the PCI to IB bridge. If optimized PCI command sequences do not exist, then the standard PCI commands are sent to the PCI to IB bridge. In another embodiment, a state machine is added to the OS that is designed to issue optimized PCI commands in response to the PCI transaction request. The state machine may be referenced to a particular PCI I/O device using a PCI I/O device identification (ID) value supplied by the manufacture of the I/O device.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.