One or more embodiments of the present invention relate to virtualized computer systems, and, in particular, to a system and method for providing a guest operating system (O/S) in a virtualized computer system with direct access to a hardware device.
General Computer System with a PCI Bus
FIG. 1A shows a general computer system that comprises system hardware 30. System hardware 30 may be a conventional computer system, such as a personal computer based on the widespread “x86” processor architecture from Intel Corporation of Santa Clara, Calif., and system hardware 30 may include conventional components, such as one or more processors, system memory, and a local disk. System memory is typically some form of high-speed RAM (Random Access Memory), whereas the disk (one or more) is typically a non-volatile, mass storage device. System hardware 30 may also include other conventional components such as a memory management unit (MMU), various registers, and various input/output (I/O) devices.
As further shown in FIG. 1A, system hardware 30 includes Central Processing Unit 32 (CPU 32), host/PCI bridge 36, system memory 40, Small Computer System Interface (SCSI) Host Bus Adapter (HBA) card 44 (SCSI HBA 44), Network Interface Card 46 (NIC 46), and graphics adapter 48, each of which may be conventional devices. As further shown in FIG. 1A: (a) CPU 32 is connected to host/PCI bridge 36 by CPU local bus 34 in a conventional manner; (b) system memory 40 is connected to host/PCI bridge 36 by memory bus 38 in a conventional manner; and (c) SCSI HBA 44, NIC 46 and graphics adapter 48 are connected to host/PCI bridge 36 by Peripheral Component Interconnect bus 42 (PCI bus 42) in a conventional manner. As further shown in FIG. 1A, graphics adapter 48 is connected to conventional video monitor 62 in a conventional manner; and NIC 46 is connected to one or more conventional data networks 60 in a conventional manner. Networks 60 may be based on Ethernet technology, for example, and the networks may use the Internet Protocol and the Transmission Control Protocol (TCP/IP), for example. Also, SCSI HBA 44 supports SCSI bus 50 in a conventional manner, and various devices may be connected to SCSI bus 50 in a conventional manner. For example, FIG. 1A shows SCSI disk 52 and tape storage device 54 connected to SCSI bus 50. Other devices may also be connected to SCSI bus 50. SCSI HBA 44 may be an Adaptec Ultra320 or Ultra160 SCSI PCI HBA from Adaptec, Inc., or an LSI Logic Fusion-MPT SCSI HBA from LSI Logic Corporation, for example.
Computer systems generally have system level software and application software executing on the system hardware. As shown in FIG. 1A, system software 21 (system S/W 21) is executing on system hardware 30. As further shown in FIG. 1A, system software 21 includes operating system (OS) 20 and system BIOS (Basic Input/Output System) 22, although other system level software configurations are also possible. OS 20 may be a conventional OS for system hardware 30, such as a Windows OS from Microsoft Corp. or a Linux OS, for example.
A Windows OS from Microsoft Corp. may be a Windows Vista OS, Windows XP OS or a Windows 2000 OS, for example, while a Linux OS may be a distribution from Novell, Inc. (SUSE Linux), Mandrakesoft S. A. or Red Hat, Inc. OS 20 may include a set of drivers 24, some of which may be packaged with OS 20, and some of which may be separately loaded onto system hardware 30. Drivers 24 may provide a variety of functions, including supporting interfaces with SCSI HBA 44, NIC 46 and graphics adapter 48. Drivers 24 may also be conventional for system hardware 30 and OS 20. System BIOS 22 may also be conventional for system hardware 30. Finally, FIG. 1A shows a set of one or more applications 10 (APPS 10) executing on system hardware 30. APPS 10 may also be conventional for system hardware 30 and OS 20.
The computer system shown in FIG. 1A may be initialized in a conventional manner. Thus, when the computer system is powered up, or restarted, system BIOS 22 and/or OS 20, or, more generally, system software 21, may detect and configure various aspects of system hardware 30 in a conventional manner. For example, system software 21 may detect and configure devices interacting with PCI bus 42 (i.e., PCI devices) in a conventional manner, including, in particular, SCSI HBA 44. A person of skill in the art will understand how such devices are detected and configured. Briefly, a PCI device typically implements at least 16 “doublewords” of standard configuration registers, where there are 32 bits in a “doubleword.” System software 21 attempts to access the configuration registers of PCI devices at each possible location on PCI bus 42, including each PCI slot in system hardware 30. Attempting to access the configuration registers enables system software 21 to determine whether there is a PCI device at each possible location on PCI bus 42, as well as the function or functions that are implemented in each PCI device. System software 21 can then obtain additional information from the configuration registers of each PCI device, and configure such devices appropriately.
If a PCI device implements an extended ROM (Read Only Memory), which may also be referred to as a device ROM or option ROM, then system software 21 typically copies a code image from the ROM on the PCI device into system memory 40 (for example, RAM) within system hardware 30. An initialization module within the code image is typically executed as part of the initialization process, and this may further initialize the PCI device and/or other devices connected to the PCI device. Referring again to FIG. 1A, during the initialization process, system software 21 attempts to access the configuration registers of PCI devices at each possible location on PCI bus 42, and detects graphics adapter 48, NIC 46 and SCSI HBA 44. System software 21 determines the functions implemented in each of these devices, along with other relevant information, and initializes each of the devices appropriately. SCSI HBA 44 typically includes an extended ROM which contains an initialization module that, when executed, initializes SCSI bus 50 and devices connected to SCSI bus 50, including SCSI DISK 52 and tape storage device 54. The initialization of PCI bus 42; devices connected to PCI bus 42, including graphics adapter 48, NIC 46, and SCSI HBA 44; SCSI bus 50; and devices connected to SCSI bus 50, including SCSI disk 52 and tape storage device 54, may all be performed in a conventional manner.
FIG. 1B shows a set of PCI configuration registers 45 for SCSI HBA 44. As described above, during initialization, system software 21 accesses PCI configuration registers 45 to detect the presence of SCSI HBA 44 and to initialize SCSI HBA 44. PCI configuration registers 45 may also be accessed by system software 21 or by other software running on system hardware 30, at other times, for other purposes. FIG. 1B shows, more specifically, Vendor ID (Identifier) register 45A, Device ID register 45B, Command register 45C, Status register 45D, Revision ID register 45E, Class Code register 45F, Cache Line Size register 45G, Latency Timer register 45H, Header Type register 45I, Built-In Self-Test (BIST) register 45J, Base Address 0 register 45K, Base Address 1 register 45L, Base Address 2 register 45M, Base Address 3 register 45N, Base Address 4 register 45O, Base Address 5 register 45P, CardBus Card Information Structure (CIS) Pointer register 45Q, Subsystem Vendor ID register 45R, Subsystem ID register 45S, Expansion ROM Base Address register 45T, first reserved register 45U, second reserved register 45V, Interrupt Line register 45W, Interrupt Pin register 45X, Min_Gnt register 45Y, and Max_Lat register 45Z. Depending on the particular SCSI HBA used, however, one or more of these registers may not be implemented. The format, function and use of these configuration registers, including specific information regarding how to access these configuration registers, are well understood in the art and need not be described further.
FIG. 1B also shows PCI extended configuration space 45AA, which may include a set of Device-capability Registers. The formats of these Device-capability registers are standard, although devices from different vendors may advertise different capabilities. For example, the content of Device-capability Registers 45AA may differ between multiple SCSI HBA devices from different vendors. Finally, the format and content of these registers may even vary for different models of the same type of device from a single vendor.
Referring again to the initialization process, when system software 21 is initializing devices on PCI bus 42, system software 21 reads one or more of PCI configuration registers 45 of SCSI HBA 44, such as Vendor ID register 45A and Device ID register 45B, and determines the presence and type of device SCSI HBA 44 is. System software 21 then reads additional configuration registers, and configures SCSI HBA 44 appropriately, by writing certain values to some of configuration registers 45. In particular, system software 21 reads one or more of Base Address Registers (BARs) 45K, 45L, 45M, 45N, 450, and 45P to determine how many regions and how many blocks of memory and/or I/O address space SCSI HBA 44 requires, and system software 21 writes to one or more of the Base Address registers to specify address range(s) to satisfy these requirements.
As an example, suppose that Base Address 0 register (BAR 0) 45K indicates that SCSI HBA 44 requires a first number of blocks of I/O address space and that Base Address 1 (BAR 1) register 45L indicates that SCSI HBA 44 requires a second number of blocks of memory address space. This situation is illustrated in FIG. 1C, showing configuration address space 70, I/O address space 72, and memory address space 74. System software 21 may write to Base Address 0 register (BAR 0) 45K and specify I/O region 72A within I/O address space 72, I/O region 72A having a first number of blocks; and system software 21 may write to Base Address 1 register (BAR 1) 45L and specify memory region 74A within memory address space 74, memory region 74A having a second number of blocks. PCI configuration registers 45 of SCSI HBA 44 may be accessed within configuration address space 70. As shown in FIG. 1C, Base Address 0 register 45K contains a pointer to I/O region 72A within I/O address space 72, and Base Address 1 register 45L contains a pointer to memory region 74A within memory address space 74.
Subsequently, system software 21 may determine that SCSI HBA 44 contains an extended ROM, and system software 21 creates a copy of the ROM code in memory and executes the code in a conventional manner. Extended ROM code from SCSI HBA 44 initializes SCSI bus 50 and devices connected to SCSI bus 50, including SCSI DISK 52 and tape storage device 54, generally in a conventional manner.
After the computer system shown in FIG. 1A is initialized, including the PCI devices on PCI bus 42, configuration registers in the respective PCI devices may be accessed on an ongoing basis to interact with the PCI devices and to utilize functions implemented by the PCI devices. In particular, PCI configuration registers 45 in SCSI HBA 44 may be accessed to determine which SCI HBA is connected to PCI bus 42, to determine characteristics of PCI devices connected to SCSI bus 50, and to interface with the PCI devices on SCSI bus 50, all in a conventional manner. For example, configuration registers 45 of SCSI HBA 44 may be used to eventually determine that SCSI DISK 52 and tape storage device 54 are connected to the SCSI bus 50, and to determine various characteristics of these storage devices.
Also, after the computer system shown in FIG. 1A is initialized, software executing on system hardware 30 may perform I/O transfers to and from devices on PCI bus 42, namely I/O writes to devices on PCI bus 42 and I/O reads from devices on PCI bus 42. These I/O transfers are performed in a conventional manner using the memory regions and/or I/O regions specified in the Base Address registers of a PCI device. These I/O transfers may be DMA (Direct Memory Access) transfers from the devices or they may be non-DMA transfers. In the case of SCSI HBA 44, software executing on system hardware 30 may perform VO transfers to and from devices on SCSI bus 50, through SCSI HBA 44, in a convention manner. For example, such I/O transfers through SCSI HBA 44 may be used to write data to SCSI DISK 52 or to read data from SCSI DISK 52, both in a conventional manner. For an I/O write to SCSI DISK 52, CPU 32 conveys data to SCSI HBA 44, which then sends the data across SCSI bus 50 to SCSI DISK 52; while, for an I/O read from SCSI DISK 52, SCSI DISK 52 transmits data across SCSI bus 50 to SCSI HBA 44, and SCSI HBA 44 sends the data to CPU 32. In the example shown in FIG. 1C, such I/O transfers may be performed using I/O region 72A or memory region 74A. Such I/O transfers may be performed, for example, by SCSI driver 24 on behalf of application software in one of applications 10.
These I/O transfers to and from PCI devices may be further broken down into (a) transactions initiated by CPU 32 and (b) transactions initiated by PCI devices. Non-DMA I/O transfers involve only CPU-initiated transactions. For a non-DMA write, CPU 32 initiates the transfer, writes data to the PCI device, and the PCI device receives the data, all in the same transaction. For a non-DMA read, CPU 32 initiates the transfer and the PCI device retrieves the data and provides it to CPU 32, again all in the same transaction. Thus, non-DMA I/O transfers may be considered simple CPU accesses to the PCI devices.
DMA I/O transfers, in contrast, involve transactions initiated by the PCI devices. For a DMA write transfer, CPU 32 first writes data to a memory region without any involvement by a PCI device. CPU 32 then initiates the DMA transfer in a first transaction, involving a CPU access to the PCI device. Subsequently, the PCI device reads the data from the memory region in a second transaction. This second transaction may be considered a “DMA operation” by the PCI device. For a DMA read operation, CPU 32 initiates the DMA transfer in a first transaction, involving a CPU access to the PCI device. The PCI device then retrieves the data and writes it into a memory region in a second transaction, which may also be considered a “DMA operation” by the PCI device. Next, the CPU reads the data from the memory region without any further involvement by the PCI device. Thus, DMA I/O transfers to and from a PCI device generally involves both a CPU access to the PCI device and a DMA operation by the PCI device.
In addition to accesses to configuration registers of PCI devices and I/O transfers to and from PCI devices, PCI devices also typically generate interrupts to CPU 32 for various reasons, such as, completion of a DMA transfer. Such interrupts may be generated and handled in a conventional manner.
In summary, there are four general types of transactions that occur between CPU 32 and a PCI device, such as SCSI HBA 44. A first transaction type (“a configuration transaction”) involves an access by CPU 32 to configuration registers of the PCI device, such as PCI configuration registers 45 of SCSI HBA 44. A second transaction type (“an I/O transaction”) involves an access by CPU 32 to the PCI device, through the memory and/or I/O region(s) specified by the Base Address registers of the PCI device, such as I/O region 72A or memory region 74A for SCSI HBA 44 in the example shown in FIG. 1C. A third transaction type (“a DMA operation”) involves a DMA operation by the PCI device, which involves a read from or a write to a memory region specified by a Base Address register of the PCI device, such as memory region 74A for SCSI HBA 44 in the example shown in FIG. 1C. A fourth transaction type (“an interrupt”) involves an interrupt from the PCI device to CPU 32, such as upon completion of a DMA transfer.
General Virtualized Computer System
As is well known in the field of computer science, a virtual machine (VM) is an abstraction—a “virtualization”—of an actual physical computer system. FIG. 2A shows one possible arrangement of a computer system that implements virtualization. As shown in FIG. 2A, one or more VMs 300, or “guests,” are installed on a “host platform,” or simply “host,” which includes system hardware, and one or more layers or co-resident components comprising system-level software, such as an operating system or similar kernel, or a virtual machine monitor or hypervisor (see below), or some combination of these. The system hardware typically includes one or more processors, memory, some form of mass storage, and various other devices.
The computer system shown in FIG. 2A has the same system hardware 30 as is shown in FIG. 1A and described above. Thus, system hardware 30 shown in FIG. 2A also includes CPU 32, host/PCI bridge 36, system memory 40, SCSI HBA 44, NIC 46, and graphics adapter 48 shown in FIG. 1A, although these components are not illustrated in FIG. 2A for simplicity. As also illustrated in FIG. 1A, but not in FIG. 2A, CPU 32 is connected to host/PCI bridge 36 by CPU local bus 34, in a conventional manner; system memory 40 is connected to host/PCI bridge 36 by memory bus 38, in a conventional manner; and SCSI HBA 44, NIC 46 and graphics adapter 48 are connected to host/PCI bridge 36 by PCI bus 42, in a conventional manner.
FIG. 2A also shows the same video monitor 62, the same networks 60 and the same SCSI bus 50 as are shown in FIG. 1A, along with the same SCSI DISK 52 and the same tape storage device 54, which are again shown as being connected to SCSI bus 50. Other devices may also be connected to SCSI bus 50. Thus, graphics adapter 48 (not shown in FIG. 2A) is connected to video monitor 62 in a conventional manner; NIC 46 (not shown in FIG. 2A) is connected to data networks 60 in a conventional manner; and SCSI HBA 44 (not shown in FIG. 2A) supports SCSI bus 50 in a conventional manner.
Guest system software runs on VMs 300. Each virtual machine monitor 200 (VMM 200) (or a software layer where VM 300 and VMM 200 overlap) typically includes virtual system hardware 330. Virtual system hardware 330 typically includes at least one virtual CPU, some virtual memory, and one or more virtual devices. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components.
FIG. 2B shows aspects of virtual system hardware 330. For the example virtual computer systems of FIGS. 2A and 2B, virtual system hardware 330 is functionally similar to underlying physical system hardware 30, although, for other virtual computer systems, the virtual system hardware may be quite different from the underlying physical system hardware. Thus, FIG. 2B shows processor (CPU or Central Processing Unit) 332, host/PCI bridge 336, system memory 340, SCSI HBA 344, NIC 346, and graphics adapter 348, each of which may be implemented as conventional devices that are substantially similar to their corresponding devices in underlying physical hardware 30. As shown in FIG. 2B, CPU 332 appears to be connected to host/PCI bridge 336 in a conventional manner, as if by CPU local bus 334; system memory 340 appears to be connected to host/PCI bridge 336 in a conventional manner, as if by memory bus 338; and SCSI HBA 344, NIC 346 and graphics adapter 348 appear to be connected to host/PCI bridge 336 in a conventional manner, as if by PCI bus 342.
As further shown in FIG. 2B, graphics adapter 348 appears to be connected to conventional video monitor 362 in a conventional manner; NIC 346 appears to be connected to one or more conventional data networks 360 in a conventional manner; SCSI HBA 344 appears to support SCSI bus 350 in a conventional manner; and virtual disk 352 and tape storage device 354 appear to be connected to SCSI bus 350, in a conventional manner. Virtual disk 352 typically represents a portion of SCSI DISK 52. It is common for virtualization software to provide guest software within a VM with access to some portion of a SCSI DISK, including possibly a complete Logical Unit Number (LUN), multiple complete LUNs, some portion of a LUN, or even some combination of complete and/or partial LUNs. Whatever portion of the SCSI DISK is made available for use by the guest software, within the VM the portion is often presented to the guest software in the form of one or more complete virtual disks. Methods for virtualizing a portion of a SCSI DISK as one or more virtual disks are known in the art. Other than presenting a portion of SCSI DISK 52 as a complete virtual disk 352, all of the virtual devices illustrated in FIG. 2B may be emulated in such a manner that they are functionally similar to the corresponding physical devices illustrated in FIG. 1A, or, alternatively, the virtual devices may be emulated so as to make them quite different from the underlying physical devices.
Guest system software in VMs 300 of FIG. 2A includes OS 320, including a set of drivers 324, and system BIOS 322. FIG. 2A also shows one or more applications 310 running within VMs 300. OS 320 may be substantially the same as OS 20 of FIG. 1A, or it may be substantially different; drivers 324 may be substantially the same as drivers 24 of FIG. 1A, or they may be substantially different; system BIOS 322 may be substantially the same as system BIOS 22 of FIG. 1A, or it may be substantially different; and applications 310 may be substantially the same as applications 10 of FIG. 1A, or they may be substantially different. Also, each of these software units may be substantially the same between different VMs, as suggested in FIG. 2A, or they may be substantially different.
Note that a single VM may be configured with more than one virtualized processor. To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. For example, symmetric multi-processor (SMP) systems are available as extensions of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. Virtual machines may also be configured as SMP VMs. In addition, another configuration is found in a so-called “multi-core” architecture, in which more than one physical CPU is fabricated on a single chip, with its own set of functional units (such as a floating-point unit and an arithmetic/logic unit ALU), and in which threads can execute independently; multi-core processors typically share only limited resources, such as some cache. In further addition, a technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical CPU (hardware thread) operates simultaneously on a single chip, but in which the logical CPUs flexibly share some resource such as caches, buffers, functional units, etc.
Applications 310 running on a VM function as they would if run on a “real” computer, even though the applications are running at least partially indirectly, that is via guest OS 320 and virtual processor(s). Executable files are accessed by the guest OS from a virtual disk or virtual memory, which will be portions of an actual physical disk or memory allocated to that VM. Once an application is installed within a VM, the guest OS retrieves files from the virtual disk just as if the files had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
Some interface is generally required between guest software within a VM and various hardware components and devices in an underlying hardware platform. This interface—which may be referred to generally as “virtualization software”—may include one or more software components and/or layers, possibly including one or more of the software components known in the field of virtual machine technology as “virtual machine monitors” (VMMs), “hypervisors,” or virtualization “kernels.” Because virtualization terminology has evolved over time and has not yet become fully standardized, these terms do not always provide clear distinctions between the software layers and components to which they refer. For example, the term “hypervisor” is often used to describe both a VMM and a kernel together, either as separate but cooperating components or with one or more VMMs incorporated wholly or partially into the kernel itself; however, the term “hypervisor” is sometimes used instead to mean some variant of a VMM alone, which interfaces with some other software layer(s) or component(s) to support the virtualization. Moreover, in some systems, some virtualization code is included in at least one “superior” VM to facilitate the operations of other VMs. Furthermore, specific software support for VMs may be included in a host OS itself.
FIG. 2A shows virtual machine monitors 200 that appear as separate entities from other components of the virtualization software. Furthermore, some software components are shown and described as being within a “virtualization layer” located logically between all virtual machines and the underlying hardware platform and/or system-level host software. This virtualization layer can be considered part of the overall virtualization software, although it would be possible to implement at least part of this layer in specialized hardware.
Various virtualized hardware components may be considered to be part of VMM 200 for the sake of conceptual simplicity. In actuality, these “components” are usually implemented as software emulations by virtual device emulators 202 included in the VMMs. One advantage of such an arrangement is that the VMMs may (but need not) be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence.
Different systems may implement virtualization to different degrees—the term “virtualization” generally relates to a spectrum of definitions rather than to a bright line, and often reflects a design choice with respect to a trade-off between speed and efficiency on the one hand and isolation and universality on the other hand. For example, the term “full virtualization” is sometimes used to denote a system in which no software components of any form are included in a guest other than those that would be found in a non-virtualized computer; thus, a guest OS could be an off-the-shelf, commercially available OS with no components included specifically to support use in a virtualized environment.
In contrast, term, which has yet to achieve a universally accepted definition, is that of “para-virtualization.” As the term implies, a “para-virtualized” system is not “fully” virtualized, but rather a guest is configured in some way to provide certain features that facilitate virtualization. For example, a guest in some para-virtualized systems is designed to avoid hard-to-virtualize operations and configurations, such as by avoiding certain privileged instructions, certain memory address ranges, etc. As another example, many para-virtualized systems include an interface within a guest that enables explicit calls to other components of the virtualization software.
For some, the term para-virtualization implies that a guest OS (in particular, its kernel) is specifically designed to support such an interface. According to such a view, having, for example, an off-the-shelf version of Microsoft Windows XP as a guest OS would not be consistent with the notion of para-virtualization. Others define the term para-virtualization more broadly to include any guest OS with any code that is specifically intended to provide information directly to any other component of the virtualization software. According to this view, loading a module such as a driver designed to communicate with other virtualization components renders the system para-virtualized, even if the guest OS, as such, is an off-the-shelf, commercially available OS not specifically designed to support a virtualized computer system.
In addition to the sometimes fuzzy distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration and a non-hosted configuration (which is shown in FIG. 2A). In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain input/output (I/O) operations, alongside and sometimes at the request of the VMM. A Workstation virtualization product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., “System and Method for Virtualizing Computer Systems,” 17 Dec. 2002).
As illustrated in FIG. 2A, in many cases, it may be beneficial to deploy VMMs on top of a software layer—kernel 100 (also referred to as VMKernel 100)—constructed specifically to provide efficient support for VMs. This configuration is frequently referred to as being “non-hosted.” Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple virtual machines. Thus, the VMM may include resource manager 102, for example, for managing resources across multiple virtual machines. Compared with a hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting primarily of VMs/VMMs. Kernel 100 may also handle other applications running on it that can be separately scheduled, as well as a console operating system that, in some architectures, is used to boot the system and facilitate certain user interactions with the virtualization software.
Note that VMkernel 100 shown in FIG. 2A is not the same as a kernel that will be within guest OS 320—as is well known, every operating system has its own kernel. Note also that kernel 100 is part of the “host” platform of the VM/VMM as defined above even though the configuration shown in FIG. 2A is commonly termed “non-hosted;” moreover, VMkernel 100 is part of the host and part of the virtualization software or “hypervisor.” The difference in terminology is one of perspective and definitions that are still evolving in the art of virtualization.
One of device emulators 202 emulates virtual SCSI HBA 344, using physical SCSI HBA 44 to actually perform data transfers, etc. Thus, for example, if guest software attempts to read data from what it sees as virtual disk 352, SCSI device driver 324 typically interacts with what it sees as SCSI HBA 344 to request the data. Device emulator 202 responds to SCSI device driver 324, and causes physical SCSI HBA 44 to read the requested data from an appropriate location within physical SCSI DISK 52. Device emulator 202 typically has to translate a SCSI I/O operation initiated by SCSI device driver 324 into a corresponding SCSI operation issued to SCSI HBA 44, and finally onto SCSI DISK 52. Methods for emulating disks and SCSI DISKs, and for translating disk operations during such emulations, are known in the art.
During the operation of VM 300, SCSI device driver 324 typically interacts with virtual SCSI HBA 344 just as if it were a real, physical SCSI HBA. At different times, SCSI device driver 324 may exercise different functionality of virtual SCSI HBA 344, and so device emulator 202 typically must emulate all the functionality of the virtual SCSI HBA. However, device emulator 202 does not necessarily have to emulate all of the functionality of physical SCSI HBA 44. Virtual SCSI HBA 344 emulated by device emulator 202 may be substantially different from physical SCSI HBA 44. For example, virtual SCSI HBA 344 may be more of a generic SCSI HBA, implementing less functionality than physical SCSI HBA 44. Nonetheless, device emulator 202 typically emulates all the functionality of some SCSI HBA. Thus, for example, SCSI driver 324 may attempt to access the PCI configuration registers of virtual SCSI HBA 344, and device emulator 202 typically must emulate the functionality of the configuration registers.
FIG. 2C illustrates a set of emulated or virtual PCI configuration registers 345. Specifically, FIG. 2C shows Vendor ID register 345A, Device ID register 345B, Command register 345C, Status register 345D, Revision ID register 345E, Class Code register 345F, Cache Line Size register 345G, Latency Timer register 345H, Header Type register 345I, BIST register 345J, Base Address 0 register 345K, Base Address 1 register 345L, Base Address 2 register 345M, Base Address 3 register 345N, Base Address 4 register 345O, Base Address 5 register 345P, CardBus CIS Pointer register 345Q, Subsystem Vendor ID register 345R, Subsystem ID register 345S, Expansion ROM Base Address register 345T, first reserved register 345U, second reserved register 345V, Interrupt Line register 345W, Interrupt Pin register 345X, Min_Gnt register 345Y, and Max_Lat register 345Z. As with physical SCSI HBA 44, one or more of these registers may not be implemented, depending on the particular SCSI HBA that is emulated as virtual SCSI HBA 344. Also, the registers that are implemented in virtual PCI configuration registers 345 may differ from the registers that are implemented in physical PCI configuration registers 45. FIG. 2C also shows Virtual PCI Extended Configuration Space (including a set of Device-Specific Registers) 345AA.
The contents of virtual PCI configuration registers 345 are generally different from the contents of physical PCI configuration registers 45, and the format of Device-Specific Registers 345AA may be different from the format of Device-Specific Registers 45AA, typically depending more on the design and implementation of the virtualization software than on the characteristics of physical SCSI HBA 44 or any connected SCSI devices. For example, the virtualization software may be implemented so as to allow a VM to be migrated from one physical computer to another physical computer. A VM may be migrated from a first physical computer to a second physical computer by copying VM state and memory state information for the VM from the first computer to the second computer, and restarting the VM on the second physical computer. Migration of VMs is more practical and efficient if the VMs include more generic virtual hardware that is independent of the physical hardware of the underlying computer system. Thus, virtual PCI configuration registers 345 for such an implementation would reflect the generic virtual hardware, instead of the underlying physical hardware of the computer on which the VM is currently running. Thus, there may be no, or only limited, correlation between the contents of virtual PCI configuration registers 345 and physical PCI configuration registers 45, and between the format of virtual Device-Specific Registers 345AA and physical Device-Specific Registers 45AA.
FIG. 2D illustrates configuration address space 370 corresponding to virtual PCI configuration register 345. As an example, suppose that Base Address 0 register (BAR 0) 345K indicates that SCSI HBA 344 requires a first number of blocks of I/O address space and that Base Address 1 (BAR 1) register 345L indicates that SCSI HBA 344 requires a second number of blocks of memory address space. As shown in FIG. 2D, guest OS 320 may write to Base Address 0 register (BAR 0) 345K and specify I/O region 372A within I/O address space 372, I/O region 372A having the first number of blocks; and guest OS 320 may write to Base Address 1 register (BAR 1) 345L and specify memory region 374A within memory address space 374, memory region 374A having the second number of blocks. PCI configuration registers 345 of SCSI HBA 344 may be accessed within configuration address space 370. Base Address 0 register 345K contains a pointer to I/O region 372A within I/O address space 372, and Base Address 1 register 345L contains a pointer to memory region 374A within memory address space 374.
Subsequently, guest OS 320 may determine that virtual SCSI HBA 344 contains an extended ROM, and guest OS 320 creates a copy of the ROM code in memory and executes the code in a conventional manner. The extended ROM code from virtual SCSI HBA 344 initializes virtual SCSI bus 350 and the devices connected to virtual SCSI bus 350, including virtual SCSI DISK 352 and virtual tape storage device 354, generally in a conventional manner.
In general, conventional virtualized computer systems do not allow guest OS 320 to control the actual physical hardware devices. For example, guest OS 320 running on VM 300 would not have direct access to SCSI HBA 44 or SCSI disk 52. This is because virtualized computer systems have virtualization software such as VMM 200 and VMKernel 100 coordinate each VM's access to the physical devices to allow multiple VMs 300 to run on shared system H/W 30 without conflict.