1. Field of the Invention
This invention relates to virtualized computer systems, and, in particular, to a system and method for routing data over multiple paths between a virtual computer system and a data storage system.
2. Description of the Related Art
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete,” isolated computer. In addition, the advantages of storage area networks and other redundant, multipath data storage systems have become widely recognized. These advantages include higher availability and better use of storage resources. This invention involves an improved system and method for combining virtual machine technology with multipath storage technologies to enhance the advantages of each of these technologies.
General Virtualized Computer System
As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. FIG. 1 illustrates, in part, the general configuration of a virtual computer system 700A, including a virtual machine 200, which is installed as a “guest” on a “host” hardware platform 100.
As FIG. 1 shows, the hardware platform 100 includes one or more processors (CPUs) 110, system memory 130, and one or more local and/or remote storage devices, which will typically include a local disk 140. The system memory will typically be some form of high-speed RAM, whereas the disk (one or more) will typically be a non-volatile, mass storage device. The hardware 100 will also include other conventional mechanisms such as a memory management unit MMU 150 and various registers 160.
Each VM 200 will typically include at least one virtual CPU 210, at least one virtual disk 240, a virtual system memory 230, a guest operating system 220 (which may simply be a copy of a conventional operating system), and various virtual devices 270, in which case the guest operating system (“guest OS”) will include corresponding drivers 224. All of the components of the VM may be implemented in software using known techniques to emulate the corresponding components of an actual computer.
If the VM is properly designed, then it will not be apparent to the user that any applications 260 running within the VM are running indirectly, that is, via the guest OS and virtual processor. Applications 260 running within the VM will act just as they would if run on a “real” computer, except for a decrease in running speed that will be noticeable only in exceptionally time-critical applications. Executable files will be accessed by the guest OS from a virtual disk or virtual memory, which may simply be portions of an actual physical disk or memory allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if they had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines is well known in the field of computer science.
Some interface is usually required between a VM and the underlying host platform (in particular, the CPU), which is responsible for actually executing VM-issued instructions and transferring data to and from the actual memory and storage devices. A common term for this interface is a “virtual machine monitor” (VMM), shown as component 300. A VMM is usually a thin piece of software that runs directly on top of a host, or directly on the hardware, and virtualizes the resources of the physical host machine. Among other components, the VMM therefore usually includes device emulators 330, which may constitute the virtual devices 270 that the VM 200 accesses. The interface exported to the VM is then the same as the hardware interface of the machine, so that the guest OS cannot determine the presence of the VMM.
The VMM also usually tracks and either forwards (to some form of operating system) or itself schedules and handles all requests by its VM for machine resources, as well as various faults and interrupts. A mechanism known in the art as an exception or interrupt handler 355 is therefore included in the VMM. As is well known, such an interrupt/exception handler normally includes an interrupt descriptor table (IDT), or some similar table, which is typically a data structure that uses information in the interrupt signal to point to an entry address for a set of instructions that are to be executed when the interrupt/exception occurs.
Although the VM (and thus the user of applications running in the VM) cannot usually detect the presence of the VMM, the VMM and the VM may be viewed as together forming a single virtual computer. They are shown in FIG. 1 as separate components for the sake of clarity.
Moreover, the various virtualized hardware components such as the virtual CPU(s) 210, the virtual memory 230, the virtual disk 240, and the virtual device(s) 270 are shown as being part of the VM 200 for the sake of conceptual simplicity—in actual implementations these “components” are usually constructs or emulations exported to the VM by the VMM. For example, the virtual disk 240 is shown as being within the VM 200. This virtual component, which could alternatively be included among the virtual devices 270, may in fact be implemented as one of the device emulators 330 in the VMM.
The device emulators 330 emulate the system resources for use within the VM. These device emulators will then typically also handle any necessary conversions between the resources as exported to the VM and the actual physical resources. One advantage of such an arrangement is that the VMM may be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence. For example, the VMM may be set up with a device emulator 330 that emulates a standard Small Computer System Interface (SCSI) disk, so that the virtual disk 240 appears to the VM 200 to be a standard SCSI disk connected to a standard SCSI adapter, whereas the underlying, actual, physical disk 140 may be something else. In this case, a standard SCSI driver is installed into the guest OS 220 as one of the drivers 224. The device emulator 330 then interfaces with the driver 224 and handles disk operations for the VM 200. The device emulator 330 then converts the disk operations from the VM 200 to corresponding disk operations for the physical disk 140.
Virtual and Physical Memory
As in most modern computers, the address space of the memory 130 is partitioned into pages (for example, in the Intel x86 architecture) or other analogous units. Applications then address the memory 130 using virtual addresses (VAs), which include virtual page numbers (VPNs). The VAs are then mapped to physical addresses (PAs) that are used to address the physical memory 130. (VAs and PAs have a common offset from a base address, so that only the VPN needs to be converted into a corresponding physical page number (PPN).) The concepts of VPNs and PPNs, as well as the way in which the different page numbering schemes are implemented and used, are described in many standard texts, such as “Computer Organization and Design: The Hardware/Software Interface,” by David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1994, pp. 579-603 (chapter 7.4 “Virtual Memory”). Similar mappings are used in other architectures where relocatability is possible.
An extra level of addressing indirection is typically implemented in virtualized systems in that a VPN issued by an application 260 in the VM 200 is remapped twice in order to determine which page of the hardware memory is intended. The first mapping is provided by a mapping module within the guest OS 220, which translates the guest VPN (GVPN) into a corresponding guest PPN (GPPN) in the conventional manner. The guest OS therefore “believes” that it is directly addressing the actual hardware memory, but in fact it is not.
Of course, a valid address to the actual hardware memory must ultimately be generated. A memory management module 350, located typically in the VMM 300, therefore performs the second mapping by taking the GPPN issued by the guest OS 220 and mapping it to a hardware (or “machine”) page number PPN that can be used to address the hardware memory 130. This GPPN-to-PPN mapping may instead be done in the main system-level software layer (such as in a mapping module in a kernel 600A, which is described below), depending on the implementation. From the perspective of the guest OS, the GVPN and GPPN might be virtual and physical page numbers just as they would be if the guest OS were the only OS in the system. From the perspective of the system software, however, the GPPN is a page number that is then mapped into the physical memory space of the hardware memory as a PPN.
System Software Configurations in Virtualized Systems
In some systems, such as the Workstation product of VMware, Inc., of Palo Alto, Calif., the VMM is co-resident at system level with a host operating system. Both the VMM and the host OS can independently modify the state of the host processor, but the VMM calls into the host OS via a driver and a dedicated user-level application to have the host OS perform certain I/O operations on behalf of the VM. The virtual computer in this configuration is thus fully hosted in that it runs on an existing host hardware platform and together with an existing host OS.
In other implementations, a dedicated kernel takes the place of and performs the conventional functions of the host OS, and virtual computers run on the kernel. FIG. 1 illustrates a kernel 600A that serves as the system software for several VM/VMM pairs 200/300, . . . , 200N/300N. Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple VMs (for example, for resource management). Compared with the hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting of VMMs. The ESX Server product of VMware, Inc., has such a configuration. The invention described below takes advantage of the ability to optimize a kernel as a platform for virtual computers.
A kernel-based virtualization system of the type illustrated in FIG. 1 is described in U.S. patent application Ser. No. 09/877,378 (“Computer Configuration for Resource Management in Systems Including a Virtual Machine”), which is incorporated here by reference. The main components of this system and aspects of their interaction are, however, outlined below.
At boot-up time, an existing operating system 420 may be at system level and the kernel 600A may not yet even be operational within the system. In such case, one of the functions of the OS 420 may be to make it possible to load the kernel 600A, after which the kernel runs on the native hardware 100 and manages system resources. In effect, the kernel, once loaded, displaces the OS 420. Thus, the kernel 600A may be viewed either as displacing the OS 420 from the system level and taking this place itself, or as residing at a “sub-system level.” When interposed between the OS 420 and the hardware 100, the kernel 600A essentially turns the OS 420 into an “application,” which has access to system resources only when allowed by the kernel 600A. The kernel then schedules the OS 420 as if it were any other component that needs to use system resources.
The OS 420 may also be included to allow applications unrelated to virtualization to run; for example, a system administrator may need such applications to monitor the hardware 100 or to perform other administrative routines. The OS 420 may thus be viewed as a “console” OS (COS). In such implementations, the kernel 600A preferably also includes a remote procedure call (RPC) mechanism to enable communication between, for example, the VMM 300 and any applications 430 installed to run on the COS 420.
Actions
In kernel-based systems such as the one illustrated in FIG. 1, there must be some way for the kernel 600A to communicate with the VMM 300. In general, the VMM 300 can call into the kernel 600A but the kernel cannot call directly into the VMM. The conventional technique for overcoming this is for the kernel to post “actions” (requests for the VMM to do something) on an action queue stored in memory 130. As part of the VMM code, the VMM looks at this queue periodically, and always after it returns from a kernel call and also before it resumes a VM. One typical action is the “raise interrupt” action: If the VMM sees this action it will raise an interrupt to the VM 200 in the conventional manner.
As is known, for example, from U.S. Pat. No. 6,397,242 (Devine, et al., 28 May 2002), some virtualization systems allow VM instructions to run directly (in “direct execution”) on the hardware CPU(s) when possible. When necessary, however, VM execution is switched to the technique known as “binary translation,” during which the VM is running in the VMM. In any systems where the VM is running in direct execution when it becomes necessary for the VMM to check actions, the kernel must interrupt the VMM so that it will stop executing VM instructions and check its action queue. This may be done using known programming techniques.
Worlds
The kernel 600A handles not only the various VMM/VMs, but also any other applications running on the kernel, as well as the COS 420 and even the hardware CPU(s) 110, as entities that can be separately scheduled. In this disclosure, each schedulable entity is referred to as a “world,” which contains a thread of control, an address space, machine memory, and handles to the various device objects that it is accessing. Worlds are stored in a portion of the memory space controlled by the kernel. More specifically, the worlds are controlled by a world manager, represented in FIG. 1 within the kernel 600A as module 612. Each world also has its own task structure, and usually also a data structure for storing the hardware state currently associated with the respective world.
There will usually be different types of worlds: 1) system worlds, which are used for idle worlds, one per CPU, and a helper world that performs tasks that need to be done asynchronously; 2) a console world, which is a special world that runs in the kernel and is associated with the COS 420; and 3) virtual machine worlds.
Worlds preferably run at the most-privileged level (for example, in a system with the Intel x86 architecture, this will be level CPL0), that is, with full rights to invoke any privileged CPU operations. A VMM, which, along with its VM, constitutes a separate world, therefore may use these privileged instructions to allow it to run its associated VM so that it performs just like a corresponding “real” computer, even with respect to privileged operations.
Switching Worlds
When the world that is running on a particular CPU (which may be the only one) is preempted by or yields to another world, then a world switch has to occur. A world switch involves saving the context of the current world and restoring the context of the new world such that the new world can begin executing where it left off the last time that it was running.
The first part of the world switch procedure that is carried out by the kernel is that the current world's state is saved in a data structure that is stored in the kernel's data area. Assuming the common case of an underlying Intel x86 architecture, the state that is saved will typically include: 1) the exception flags register; 2) general purpose registers; 3) segment registers; 4) the instruction pointer (EIP) register; 5) the local descriptor table register; 6) the task register; 7) debug registers; 8) control registers; 9) the interrupt descriptor table register; 10) the global descriptor table register; and 11) the floating point state. Similar state information will need to be saved in systems with other hardware architectures.
After the state of the current world is saved, the state of the new world can be restored. During the process of restoring the new world's state, no exceptions are allowed to take place because, if they did, the state of the new world would be inconsistent upon restoration of the state. The same state that was saved is therefore restored. The last step in the world switch procedure is restoring the new world's code segment and instruction pointer (EIP) registers.
When worlds are initially created, the saved state area for the world is initialized to contain the proper information such that when the system switches to that world, then enough of its state is restored to enable the world to start running. The EIP is therefore set to the address of a special world start function. Thus, when a running world switches to a new world that has never run before, the act of restoring the EIP register will cause the world to begin executing in the world start function.
Switching from and to the COS world requires additional steps, which are described in U.S. patent application Ser. No. 09/877,378, mentioned above. Understanding the details of this process is not necessary for understanding the present invention, however, so further discussion is omitted.
Memory Management in Kernel-Based System
The kernel 600A includes a memory management module 616 that manages all machine memory that is not allocated exclusively to the COS 420. When the kernel 600A is loaded, the information about the maximum amount of memory available on the machine is available to the kernel, as well as information about how much of it is being used by the COS. Part of the machine memory is used for the kernel 600A itself and the rest is used for the virtual machine worlds.
Virtual machine worlds use machine memory for two purposes. First, memory is used to back portions of each world's memory region, that is, to store code, data, stacks, etc., in the VMM page table. For example, the code and data for the VMM 300 is backed by machine memory allocated by the kernel 600A. Second, memory is used for the guest memory of the virtual machine. The memory management module may include any algorithms for dynamically allocating memory among the different VM's 200.
Interrupt and Exception Handling in Kernel-Based Systems
Interrupt and exception handling is related to the concept of “worlds” described above. As mentioned above, one aspect of switching worlds is changing various descriptor tables. One of the descriptor tables that is loaded when a new world is to be run is the new world's IDT. The kernel 600A therefore preferably also includes an interrupt/exception handler 655 that is able to intercept and handle (using a corresponding IDT in the conventional manner) interrupts and exceptions for all devices on the machine. When the VMM world is running, whichever IDT was previously loaded is replaced by the VMM's IDT, such that the VMM will handle all interrupts and exceptions.
The VMM will handle some interrupts and exceptions completely on its own. For other interrupts/exceptions, it will be either necessary or at least more efficient for the VMM to call the kernel to have the kernel either handle the interrupts/exceptions itself, or to forward them to some other sub-system such as the COS. One example of an interrupt that the VMM can handle completely on its own, with no call to the kernel, is a check-action IPI (inter-processor interrupt). One example of when the VMM preferably calls the kernel, which then forwards an interrupt to the COS, would be where the interrupt involves devices such as a mouse, which is typically controlled by the COS. The VMM may forward still other interrupts to the VM.
Device Access in Kernel-Based System
In the preferred embodiment of the invention, the kernel 600A is responsible for providing access to all devices on the physical machine. In addition to other modules that the designer may choose to load onto the system for access by the kernel, the kernel will therefore typically load conventional drivers as needed to control access to devices. Accordingly, FIG. 1 shows a module 610A containing loadable kernel modules and drivers. The kernel 600A may interface with the loadable modules and drivers in a conventional manner, using an application program interface (API) or similar interface.
Redundant, Multipath Data Storage Systems
This invention is particularly advantageous in relation to server computer systems, although it is not limited to such systems. Servers, by their very nature, generally require access to large amounts of data. For example, web servers, database servers and email servers all typically require access to large data stores. Various types of data storage devices and systems may be used to satisfy this requirement. For example, a server may be connected to a RAID (redundant array of inexpensive disks) storage unit (or disk array), a JBOD (just a bunch of disks) storage unit or a tape storage unit, to name a few. Storage systems may also comprise a combination of multiple types of storage devices, such as a RAID storage unit combined with a tape storage unit. Large data storage systems generally also include one or more storage processors, which coordinate writing data to the storage units and reading data from the storage units.
There are also various different interface mechanisms for connecting storage systems to servers, including everything from a simple SCSI interface to a complex Fibre Channel network. Also, it is often advantageous to connect multiple data storage systems to a single storage network, and/or to give multiple servers access to the same one or more data storage systems on a network. All of these various combinations of servers, data storage units and systems, and interface technologies are well known to a person of skill in the art and they are thoroughly described in existing literature. These various combinations, and others, can be used in a wide variety of different embodiments of this invention.
It is also advantageous to provide redundancy in connection with a server's access to its data store. A RAID storage unit has built in redundancy, as is well known in the art. Also, multiple storage processors may be provided that can each provide access to a server's data store, so that, if one processor fails, the other can still provide access. Also, multiple interfaces can be provided between a server and its data store, so that a second interface may be used if a first interface fails, such as multiple SCSI adapters for a direct-attached SCSI storage system, or a multipath Fibre Channel network. This invention may be used with any such multipath data storage system.
Each of these aspects of data storage systems provides different advantages to the overall server system. For example, providing storage units that use different media or technologies can lead to cost savings and efficiency in accessing data. RAID units and JBOD units can be used for data for which quick access is required, while tape units can be used for other data. Also, RAID units can be used instead of JBOD units for data that is relatively more important. Also, giving multiple servers shared access to one or more data storage systems can lead to better use of the storage resources. Otherwise, if each server had its own separate data storage system, surplus storage capacity that is not being used by one server could not readily be used by another server. Finally, providing redundant storage systems and multiple methods for accessing the storage systems can lead to a highly available data store. If one means for accessing a data store fails, communication can be switched over to another means for accessing the data.
One example of a redundant, multipath data storage system is illustrated in FIG. 2. FIG. 2 shows a first server 10A and a second server 10B connected to a storage area network (SAN) 22. The servers 10A and 10B may be any type of server, such as a conventional server based on the Intel IA-32 architecture and running a Linux OS, and may fulfill any of numerous different functions, such as implementing a web server, an email server or a database server. The SAN 22 may be any of a wide variety of SANs, which are well known.
In this example, the SAN 22 comprises a plurality of data storage units 18, specifically a first data storage unit 18A and a second data storage unit 18B. The storage units may be any type of storage unit or any combination of different types of storage units, including, for example, RAID storage units, JBOD storage units and tape storage units. The first storage unit 18A is controlled by a first storage processor 16A and a second storage processor 16B, while the second storage unit 18B is controlled by a third storage processor 16C and a fourth storage processor 16D, which may be any storage processors, including conventional storage processors. The first storage processor 16A is connected to the first storage unit 18A in a conventional manner by a first interface 27A, the second storage processor 16B is connected to the first storage unit 18A in a conventional manner by a second interface 27B, the third storage processor 16C is connected to the second storage unit 18B in a conventional manner by a third interface 29A, and the fourth storage processor 16D is connected to the second storage unit 18B in a conventional manner by a fourth interface 29B. The storage units 18, combined with the storage processors 16A, 16B, 16C and 16D, constitute a storage system 20.
The storage system 20 is connected to the servers 10A and 10B by a multipath data storage network. In the example of FIG. 2, the multipath data storage network is shown as a Fibre Channel network, which could be any Fibre Channel network. The multipath data storage network could, however, be any other type of multipath data storage network, such as a multipath SCSI or iSCSI (Internet SCSI) network. The first server 10A includes a first host bus adapter (HBA) 12A and a second HBA 12B for connecting to the Fibre Channel network, while the second server 10B includes a third HBA 12C and a fourth HBA 12D for connecting to the Fibre Channel network. The Fibre Channel network includes a first Fibre Channel switch 14A and a second Fibre Channel switch 14B. The Fibre Channel switches 14A and 14B may be any such switches, according to the specifications of the Fibre Channel Industry Association (FCIA). The first HBA 12A is connected to the first Fibre Channel switch 14A by a first interface 11, while the second HBA 12B is connected to the second Fibre Channel switch 14B by a second interface 13. The third HBA 12C is connected to the first Fibre Channel switch 14A by a third interface 15, while the fourth HBA 12D is connected to the second Fibre Channel switch 14B by a fourth interface 17. The interfaces 11, 13, 15 and 17 may be, for example, a Fibre Channel cable connected to a gigabit interface converter (GBIC), as is commonly used. The first Fibre Channel switch 14A is connected to the first storage processor 16A by a first interface 19 and to the third storage processor 16C by a second interface 21. The second Fibre Channel switch 14B is connected to the second storage processor 16B by a third interface 23 and to the fourth storage processor 16D by a fourth interface 25. The interfaces 11, 13, 15, 17, 19, 21, 23 and 25 are standard Fibre Channel interfaces as specified by the FCIA.
The SAN 22 may be considered to include the storage system 20, the Fibre Channel switches 14A and 14B, and the interfaces 19, 21, 23 and 25, as illustrated in FIG. 2, or the SAN 22 may be considered to additionally include the interfaces 11, 13, 15 and 17, and possibly the servers 10A and 10B. For this description, the SAN 22 will be described in the terms illustrated in FIG. 2. Various different SANs, such as the SAN 22, are well known in the art and are described in numerous existing documents. A person of skill in the art will understand the operation of SANs, and will be able to design and implement different SAN configurations, depending on particular storage system requirements.
Much of the redundancy of the server and data storage network of FIG. 2 is readily apparent. The first server 10A may access the storage units 18 through the HBAs 12A and 12B, the interfaces or data paths 11 and 13, the Fibre Channel switches 14A and 14B, the data paths 19, 21, 23, and 25, the storage processors 16A, 16B, 16C and 16D and the data paths 27A, 27B, 29A and 29B. Similarly, the second server 10B may access the storage units 18 through the HBAs 12C and 12D, the interfaces or data paths 15 and 17, the Fibre Channel switches 14A and 14B, the data paths 19, 21, 23, and 25, the storage processors 16A, 16B, 16C and 16D and the data paths 27A, 27B, 29A and 29B. For example, the first server may access the storage unit 18A through the HBA 12A, the data path 11, the Fibre Channel switch 14A, the data path 19, the storage processor 16A and the data path 27A, while the second server may access the storage unit 18B through the HBA 12C, the data path 15, the Fibre Channel switch 14A, the data path 21, the storage processor 16C and the data path 29A.
The paths by which the servers 10A and 10B access the storage units 18 may vary too. In particular, different paths may be selected for use, depending on operating conditions within the network. For example, suppose that the first server 10A is accessing the storage unit 18A through a first path comprising the HBA 12A, the data path 11, the Fibre Channel switch 14A, the data path 19, the storage processor 16A and the data path 27A. Suppose further that the HBA 12A has a failure that prevents the HBA's use by the server 10A. This failure may be detected, and data may be rerouted between the server 10A and the storage unit 18A along a second data path, such as through the HBA 12B, the data path 13, the Fibre Channel switch 14B, the data path 23, the storage processor 16B and the data path 27B. Switching the data path by which data is routed between a server and the storage units in response to a failure in the network is referred to as a “failover.” Suppose further that the failing HBA is replaced with a new HBA, and data is again routed through the first data path. Rerouting data to the first path after correcting for a failure is referred to as a “failback.”
As is well known in the art, the Fibre Channel specifications provide mechanisms for the automatic detection of network failures, the automatic reporting of network failures to other network components, and the automatic detection and reporting of the correction of network failures. Network failures and the correction of network failures are reported to servers and storage units, enabling the servers and storage units to perform automatic failovers and automatic failbacks. Thus, in the example described above, when the HBA 12A fails, the server 10A may automatically detect the failure and it may automatically failover to the second data path described above. After the failure is corrected, the server 10A may automatically detect that the HBA is functioning properly again and it may failback to the first data path again. A person of skill in the art will know how to design and implement a SAN that performs all of these functions.
In addition to selecting between alternate paths for data routing, multiple paths may also be used at the same time for routing data between a server and the storage units. For example, the server 10A may route some data through the first data path described above and other data through the second data path described above to obtain better overall data throughput. Distributing data between multiple data paths in this manner is referred to herein as “load distribution.”
A software routine that selects paths by which data will be routed in a multipath data network, either for purposes of failovers and failbacks or for load distribution, or both, will be referred to as a storage path manager (SPM) herein. In the network of FIG. 2, the servers 10A and 10B, as well as the storage processors 16A, 16B, 16C and 16D may include SPMs. Such SPMs are known in the art, and various versions are available for purchase. This invention relates to a new implementation of an SPM at the server side, or host side, of data storage networks. The StorageWorks Secure Path software product from Hewlett-Packard Company and the PowerPath software product from EMC Corporation are examples of existing SPMs that operate from the server side of data storage networks.
Storage Path Managers
FIG. 3 illustrates the same server and data network as FIG. 2, but with a more detailed illustration of the servers 10A and 10B, instead of the SAN 22. FIG. 3 shows the servers 10A and 10B connected to the SAN 22 by the data paths 11, 13, 15 and 17. The first server 10A includes system hardware 30A, a set of drivers 34A, an operating system (OS) 32A and a set of applications 36A, while the second server 10B includes system hardware 30B, a set of drivers 34B, an OS 32B and a set of applications 36B, all of which may be conventional for the servers 10A and 10B. The system hardware 30A includes the HBA 12A and the HBA 12B, while the system hardware 30B includes the HBA 12C and the HBA 12D. The first server 10A includes a first SPM 38A that is implemented in a driver, which includes the functionality of a basic driver for the HBAs 12A and 12B. The second server 10B includes a second SPM 38B that is implemented within the OS 32B. The second server 10B also includes a separate HBA driver 37 for use with the HBAs 12C and 12D, which may be a basic HBA driver from QLogic Corporation or Emulex Corporation, for example.
The SPM 38A may be a SANblade Manager driver from QLogic Corporation, for example. The SANblade Manager driver provides the automatic failover capability described above, along with a load balancing function and a logical unit number (LUN) masking function. As is well known in the art, data storage units, such as the storage units 18, are divided into one or more LUNs each. A load balancing function is a form of a load distribution function, in which an attempt is made at distributing the load of data traffic evenly over multiple data paths. In this case, the load balancing function staggers LUNs within the storage units 18 between the HBAs 12A and 12B to distribute the loads. The LUN masking function enables specific LUNs to be masked so that the OS 32A has no knowledge of the masked LUNs and cannot access them.
The SPM 38B may be the Auto Path software product from Hewlett-Packard Company, for example. The Auto Path product also provides automatic failover and load balancing functions. The SPM 38B interacts with the HBA driver 37 in a conventional manner to control the operation of the HBAs 12C and 12D. The servers 10A and 10B may also be implemented with various other SPMs, such as the SANPoint Foundation Suite software product from Veritas Software Corporation.
Existing SPMs generally detect available data paths, provide load distribution functions, detect SAN failures and/or receive information regarding SAN failures, perform failovers, detect network corrections and/or receive notifications of corrections, and perform failbacks. The load distribution functions may include a round-robin function and/or a load balancing function. With the round-robin function, data is generally routed on an alternating or rotating basis among available data paths. For example, the SPM 38A might distribute consecutive data transfers alternately between the first and second data paths described above. With the load balancing function, data is generally distributed between available data paths so that each data transfer goes through the data path that has the lightest load at the time the data transfer is initiated.