1. Field of the Invention
2. Description of the Related Art
In complex computer systems, particularly those in large transaction processing environments as shown in FIG. 1, the available servers 100 are often clustered together to improve overall system performance. Second, these clustered servers 100 are then connected by a storage area network (SAN) to storage units 106, so that all have high performance access to storage. Further, the servers 100 are also connected to an Ethernet network to allow the various user computers 110 to interact with the servers 100. Thus, the servers 100 use a first fabric 102 for clustering, a second fabric 104 for the SAN and a third fabric 108 to communicate with the users. In normal use the cluster fabric 102 is one such as InfiniBand, the SAN fabric 104 is one such as Fibre Channel and the user fabric 108 is one such as Ethernet. Therefore, in this configuration each of the servers 100 must have three different adapters to communicate with the three fabrics. Further, the three adapters take up physical space in a particular server, thus limiting the density of available servers in a high processor count environment. This increases cost and complexity of the servers themselves. Additionally three separate networks and fabrics must be maintained.
This is shown additionally in FIG. 2 where the software components are shown. An operating system 200 is present in the server 100. Connected to the operating system 200 is a clustering driver 202 which connects with an InfiniBand host channel adapter (HCA) 204 in the illustrated embodiment. The InfiniBand HCA 204 is then connected to the InfiniBand fabric 102 for clustering. A block storage driver 206 is connected to the operating system 200 and interacts with a Fibre Channel host bus adapter (HBA) 208. The Fibre Channel HBA 208 is connected to the Fibre Channel fabric 104 to provide the SAN capability. Finally, a networking driver 210 is also connected to the operating system 200 to provide the third parallel link and is connected to a series of network interface cards (NICs) 212 which are connected to the Ethernet fabric 108.
Legacy operating systems such as Linux 2.4 or Microsoft NT4 were architected assuming that each “I/O Service” is provided by an independent adapter. An “I/O Service” is defined as the portion of adapter functionality that connects a server onto one of the network fabrics. Referring to FIG. 2, the NIC 212 provides the Networking I/O Service, the HCA 204 provides the Clustering I/O Service, and the HBA 208 provides the Block Storage I/O Service. It would be desirable to allow a single ECA or Ethernet Channel Adapter to provide all three of these I/O Services. Since most traditional high performance networking storage and cluster adapters are PCI based and enumerated as independent adapters by the Plug and Play (PnP) component of the operating system, the software stacks for each fabrics have evolved independently. In order for an ECA to be deployed on such legacy operating systems, its I/O Services must be exported using independent PCI functions. While this type of design fits nicely into the PnP environment, it exposes issues related to shared resources between the PCI functions. For example, networking and storage may want to utilize a specific Ethernet port concurrently.
Modern operating systems such as Microsoft Windows Server 2003 provide a mechanism called a consolidated driver model, which could be used to export all ECA I/O Services using only a single PCI function. However, the software associated with the consolidated driver model has implicit inefficiencies due to the layers involved in virtualizing each I/O Service using host software. In some deployment environments, it may be desirable to support the consolidated driver model, but in environments that are sensitive to latency and CPU utilization it is desirable to deploy an ECA using multiple PCI functions.
Microsoft has made some progress in integrating networking and clustering using the Winsock Direct (WSD) model. One issue with WSD is that it does not export the various RDMA (Remote Direct Memory Access) APIs (Application Programming Interfaces), such as DAPL (Direct Access Provider Library) or MPI (Message Passing Interface), that have been widely accepted by the clustering community. One approach to exporting DAPL and MPI when not natively supported on an operating system is to use an independent PCI function for clustering. Another issue with WSD is that it is not deployed on all Microsoft operating systems, so hardware vendors cannot rely on it to export their adapter I/O services in all Microsoft operating system environments.
Future operating systems architectures will certainly start to take into account the unique characteristics of ECAs, e.g., multiple network ports and multiple I/O Services implemented in one adapter. Network ports, accelerated connections, and memory registration resources are all examples of resources that the operating system has an interest of managing in a way that is intuitive and in a way that takes the best advantage of the functionality provided by an ECA. This results in a very high probability for even more deployment models which would be desirable to support.
To address these various deployment models and yet provide the broadest use of a single ECA at its full capabilities it would be desirable to have an ECA that is able to adapt to each deployment model.