1. Field of the Invention
2. Description of the Related Art
In complex computer systems, particularly those in large transaction processing environments as shown in FIG. 1, the available servers 100 are often clustered together to improve overall system performance. Second, these clustered servers 100 are then connected by a storage area network (SAN) to storage units 106, so that all have high performance access to storage. Further, the servers 100 are also connected to an Ethernet network to allow the various user computers 110 to interact with the servers 100. Thus, the servers 100 use a first fabric 102 for clustering, a second fabric 104 for the SAN and a third fabric 108 to communicate with the users. In normal use the cluster fabric 102 is one such as InfiniBand, the SAN fabric 104 is one such as Fibre Channel and the user fabric 108 is one such as Ethernet. Therefore, in this configuration each of the servers 100 must have three different adapters to communicate with the three fabrics. Further, the three adaptors take up physical space, consume more power and produce more heat in a particular server, thus limiting the density of available servers in a high processor count environment. This increases cost and complexity of the servers themselves. Additionally three separate networks and fabrics must be maintained.
This is shown additionally in FIG. 2 where the software components are shown. An operating system 200 is present in the server 100. Connected to the operating system 200 is a clustering driver 202 which connects with an InfiniBand host channel adapter (HCA) 204 in the illustrated embodiment. The InfiniBand HCA 204 is then connected to the InfiniBand fabric 102 for clustering. A block storage driver 206 is connected to the operating system 200 and interacts with a Fibre Channel host bus adapter (HBA) 208. The Fibre Channel HBA 208 is connected to the Fibre Channel fabric 104 to provide the SAN capability. Finally, a networking driver 210 is also connected to the operating system 200 to provide the third parallel link and is connected to a series of network interface cards (NICs) 212 which are connected to the Ethernet fabric 108.
Legacy operating systems such as Linux 2.4 or Microsoft NT4 were architected assuming that each “I/O Service” is provided by an independent adapter. An “I/O Service” is defined as the portion of adapter functionality that connects a server onto one of the network fabrics. Referring to FIG. 2, the NIC 212 provides the Networking I/O Service, the HCA 204 provides the Clustering I/O Service, and the HBA 208 provides the Block Storage I/O Service. It would be desirable to allow a single ECA or Ethernet Channel Adapter to provide all three of these I/O Services. Since most traditional high performance networking storage and cluster adapters are PCI based and enumerated as independent adapters by the Plug and Play (PnP) component of the operating system, the software stacks for each fabrics have evolved independently. In order for an ECA to be deployed on such legacy operating systems, its I/O Services must be exported using independent PCI functions. While this type of design fits nicely into the PnP environment, it exposes issues related to shared resources between the PCI functions.
Consider an ECA having networking and storage I/O Services. A first issue is initialization of a specific Ethernet port that is shared between the I/O Services. The independent drivers that are exporting networking and storage concurrently from separate PCI functions may want to utilize a common ECA port. In this case a single Ethernet PHY may need to be initialized by writing to MDIO registers in order to bring the Ethernet link to an active state. Access to the PHY must be coordinated or the drivers may never be able to bring the link to an active state. If the access is not coordinated, a scenario where one driver resets the PHY and then starts initializing various PHY registers when the other driver has already gotten to the point of initializing the same PHY registers can occur. The content of the PHY registers at the end of concurrent initialization performed by both drivers is indeterminate. This could lead to software errors as well as difficulties bringing up the Ethernet link.
A second example is link state change on an Ethernet port that is shared between the I/O Services. Without coordination, a link state change event may be fielded by one of the drivers, which clears the event as part of the normal processing for a link state change. It is highly likely that the second driver will not see the link state change event and therefore not behave properly.
External ports, accelerated connections, and memory registration resources are all examples of resources that must be managed in a way that is intuitive and in a way that takes the best advantage of the functionality provided by an ECA. Given that the services exported by the ECA can be loaded in any order by a PnP enabled operating system and that users can dynamically (or permanently) disable one or more of the services, an ECA should have a flexible mechanism to robustly and transparently transfer resource management responsibility between the various drivers that provide ECA services.