1. Field of the Invention
This invention relates to transfers of data involving a computer system, and, in particular, to transfers of data in which blocks of data are divided into multiple sub-blocks for conveyance.
2. Description of the Related Art
The invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. For example, the computer readable media may comprise one or more CDs (Compact Discs), one or more DVDs (Digital Versatile Discs), some form of flash memory device, a computer hard disk and/or some form of internal computer memory, to name just a few examples. An embodiment of the invention, in which one or more computer program modules is embodied in one or more computer readable media, may be made by writing the computer program modules to any combination of one or more computer readable media. Such an embodiment of the invention may be sold by enabling a customer to obtain a copy of the computer program modules in one or more computer readable media, regardless of the manner in which the customer obtains the copy of the computer program modules. Thus, for example, a computer program implementing the invention may be purchased electronically over the Internet and downloaded directly from a vendor's web server to the purchaser's computer, without any transference of any computer readable media. In such a case, writing the computer program to a hard disk of the web server to make it available over the Internet may be considered a making of the invention on the part of the vendor, and the purchase and download of the computer program by a customer may be considered a sale of the invention by the vendor, as well as a making of the invention by the customer.
The invention generally relates to virtualizing a communications channel in which blocks of data are divided into multiple sub-blocks prior to conveyance. One example of such a communications channel is a Universal Serial Bus (USB) in a standard personal computer, although the invention may also be applied to other existing or yet to be developed communications channels. A USB communications channel is used as an example throughout this description, but a person of skill in the art will be able to extend or adapt these teachings to other types of communications channels.
The invention may also be applied in computer systems involving a wide variety of different types or degrees of virtualization. The primary example in this description is based on virtualizing a substantially complete personal computer hardware platform within a physical computer system that is also based on the same personal computer hardware platform. However, the invention may be implemented in a wide variety of virtual computer systems, involving a wide variety of hardware and software platforms and a wide variety of different types and degrees of virtualization. For example, the invention may also be implemented in virtual computer systems in which the hardware platform that is virtualized is a different hardware platform from that of the underlying physical computer system, such as a cross-platform virtualization system. The invention may also be implemented in virtual computer systems in which the hardware platform that is virtualized has never been implemented in any physical computer system. The invention may also be implemented in virtual computer systems in which less than a substantially complete hardware platform is virtualized. For example, the invention may be implemented in so-called paravirtualized systems, in which one or more aspects of a hardware platform are not fully virtualized, so that an operating system (OS) for the hardware platform must be modified in some manner to execute on the virtualized hardware platform. The invention may also be implemented in computer systems involving substantially less virtualization than the virtual computer systems mentioned thus far, even including computer systems in which only a single communications channel is virtualized, while all other physical resources are not virtualized.
In particular, the invention may be implemented in existing virtualization products of the assignee of this patent, VMware, Inc. Consequently, the general architectures of two types of products of VMware, a “hosted” virtual computer system and a “kernel-based” virtual computer system, are described below to provide background for the detailed description of the invention.
Hosted Virtual Computer System
FIG. 1 illustrates the main components of a “hosted” virtual computer system 100A as generally implemented in the Workstation virtualization product of VMware, Inc. The virtual computer system 100A supports a virtual machine (VM) 300A. As is well known in the field of computer science, a VM is a software abstraction or a “virtualization,” often of an actual physical computer system. As in conventional computer systems, both system hardware 102 and system software 150 are included. The system hardware 102 includes one or more processors (CPUs) 104, which may be a single processor, or two or more cooperating processors in a known multiprocessor arrangement. The system hardware also includes system memory 108, one or more disks 110, and some form of memory management unit (MMU) 106. The system memory is typically some form of high-speed RAM (random access memory), whereas the disk is typically a non-volatile, mass storage device. As is well understood in the field of computer engineering, the system hardware also includes, or is connected to, conventional registers, interrupt-handling circuitry, a clock, etc., which, for the sake of simplicity, are not shown in the figure.
The system software 150 typically either is or at least includes an OS 152, which has drivers 154 as needed for controlling and communicating with various devices 112, and usually with the disk 110 as well. Conventional applications 160 (APPS), if included, may be installed to run on the hardware 102 via the system software 150 and any drivers needed to enable communication with devices.
The VM 300A—also known as a “virtual computer”—is often a software implementation of a complete computer system. In the VM, the physical system components of a “real” computer are emulated in software, that is, they are virtualized. Thus, the VM 300A will typically include virtualized (“guest”) system hardware 302, which in turn includes one or more virtual CPUs 304 (VCPU), virtual system memory 308 (VMEM), one or more virtual disks 310 (VDISK), and one or more virtual devices 312 (VDEVICE), all of which are implemented in software to emulate the corresponding components of an actual computer. The concept, design and operation of virtual machines are well known in the field of computer science.
The VM 300A also has system software 350, which may include a guest OS 352, as well as drivers 354 as needed, for example, to control the virtual device(s) 312. The guest OS 352 may, but need not, simply be a copy of a conventional, commodity OS. Of course, most computers are intended to run various applications, and a VM is usually no exception. Consequently, by way of example, FIG. 1 illustrates one or more applications 360 (APPS) installed to run on the guest OS 352; any number of applications, including none at all, may be loaded for running on the guest OS, limited only by the requirements of the VM. Software running in the VM 300A, including the guest OS 352 and the guest applications 360, is generally referred to as “guest software.”
Note that although the virtual hardware “layer” 302 is a software abstraction of physical components, the VM's system software 350 may be the same as would be loaded into a hardware computer. The modifier “guest” is used here to indicate that the VM, although it acts as a “real” computer from the perspective of a user, is actually just computer code that is executed on the underlying “host” hardware and software platform 102, 150. Thus, for example, I/O to a virtual device 312 will actually be carried out by I/O to a corresponding hardware device 112, but in a manner transparent to the VM.
Some interface is usually required between the VM 300A and the underlying “host” hardware 102, which is responsible for actually executing VM-related instructions and transferring data to and from the actual physical memory 108, the processor(s) 104, the disk(s) 110 and the other device(s) 112. One advantageous interface between the VM and the underlying host system is often referred to as a virtual machine monitor (VMM), also known as a virtual machine “manager.” Virtual machine monitors have a long history, dating back to mainframe computer systems in the 1960s. See, for example, Robert P. Goldberg, “Survey of Virtual Machine Research,” IEEE Computer, June 1974, p. 34-45.
A VMM is usually a relatively thin layer of software that runs directly on top of host software, such as the system software 150, or directly on the hardware, and virtualizes the resources of the (or some) hardware platform. FIG. 1 shows virtualization software 200A running directly on the system hardware 102. The virtualization software 200A may be a VMM, for example. Thus, the virtualization software 200A is also referred to herein as a VMM 200A. The VMM 200A will typically include at least one device emulator 252A, which may also form the implementation of the virtual device 312. The VMM 200A may also include a memory manager 254A that maps memory addresses used within the VM 300A (for the virtual memory 308) to appropriate memory addresses that can be applied to the physical memory 108. The VMM also usually tracks and either forwards (to the host OS 152) or itself schedules and handles all requests by its VM for machine resources, as well as various faults and interrupts. FIG. 1 therefore illustrates an interrupt (including fault) handler 256A within the VMM. The general features of VMMs are well known and are therefore not discussed in further detail here.
FIG. 1 illustrates a single VM 300A merely for the sake of simplicity; in many installations, there will be more than one VM installed to run on the common hardware platform; all may have essentially the same general structure, although the individual components need not be identical. Also in FIG. 1, a single VMM 200A is shown acting as the interface for the single VM 300A. It would also be possible to include the VMM as part of its respective VM, that is, in each virtual system. Although the VMM is usually completely transparent to the VM, the VM and VMM may be viewed as a single module that virtualizes a computer system. The VM and VMM are shown as separate software entities in the figures for the sake of clarity. Moreover, it would also be possible to use a single VMM to act as the interface for more than one VM, although it will in many cases be more difficult to switch between the different contexts of the various VMs (for example, if different VMs use different guest operating systems) than it is simply to include a separate VMM for each VM. This invention works with all such VM/VMM configurations.
In all of these configurations, there must be some way for the VM to access hardware devices, albeit in a manner transparent to the VM itself. One solution would of course be to include in the VMM all the required drivers and functionality normally found in the host OS 152 to accomplish I/O tasks. Two disadvantages of this solution are increased VMM complexity and duplicated effort—if a new device is added, then its driver would need to be loaded into both the host OS and the VMM. A third disadvantage is that the use of a hardware device by a VMM driver may confuse the host OS, which typically would expect that only the host's driver would access the hardware device. A different method for enabling the VM to access hardware devices has been implemented by VMware, Inc., in its Workstation virtualization product. This method is also illustrated in FIG. 1.
In the system illustrated in FIG. 1, both the host OS 152 and the VMM 200A are installed at system level, meaning that they both run at the greatest privilege level and can therefore independently modify the state of the hardware processor(s). For I/O to at least some devices, however, the VMM may issue requests via the host OS. To make this possible, a special driver VMdrv 258 is installed as any other driver within the host OS 152 and exposes a standard API (Application Program Interface) to a user-level application VMapp 260. When the system is in the VMM context, meaning that the VMM is taking exceptions, handling interrupts, etc., but the VMM wishes to use the existing I/O facilities of the host OS, the VMM calls the driver VMdrv 258, which then issues calls to the application VMapp 260, which then carries out the I/O request by calling the appropriate routine in the host OS.
In FIG. 1, a vertical line 230 symbolizes the boundary between the virtualized (VM/VMM) and non-virtualized (host software) “worlds” or “contexts.” The driver VMdrv 258 and application VMapp 260 thus enable communication between the worlds even though the virtualized world is essentially transparent to the host system software 150.
In some cases, however, it may be beneficial to deploy VMMs on top of a thin software layer, a “kernel,” constructed specifically for this purpose. FIG. 2 illustrates an implementation in which a kernel 202B takes the place of and performs the conventional functions of the host OS, including handling actual I/O operations. The kernel-based virtual computer system of FIG. 2 is described in greater detail below. Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers greater modularity and facilitates provision of services that extend across multiple virtual machines (for example, resource management). Also, compared with the hosted deployment, a kernel may offer greater performance because it can be co-developed with the VMM and be optimized for the characteristics of a workload consisting of VMMs.
As used herein, the “host” OS therefore means either the native OS 152 of the underlying physical computer, a specially constructed kernel 202B as described below, or whatever other system-level software handles actual I/O operations, takes interrupts, etc. for the VM. The invention may be used in all the different configurations mentioned above.
Kernel-Based Virtual Computer System
FIG. 2 illustrates the main components of a “kernel-based” virtual computer system 100B as generally implemented in the ESX Server virtualization product of VMware, Inc. A kernel-based virtualization system of the type illustrated in FIG. 2 is described in U.S. patent application Ser. No. 09/877,378 (“Computer Configuration for Resource Management in Systems Including a Virtual Machine”), which is incorporated here by reference. The main components of this system and aspects of their interaction are, however, outlined below.
The virtual computer system 100B includes one or more VMs, such as a first VM 300B and a second VM 300C. Each VM is installed as a “guest” on a “host” hardware platform, which, as shown in FIG. 2, may be the same as the hardware platform 102 of the virtual computer system 100A of FIG. 1. Thus, FIG. 2 shows the hardware platform 102 as including the one or more processors (CPUs) 104, the system memory 108, one or more disks 110, the MMU 106, and the device(s) 112.
Each VM 300B, 300C may include the same virtualized (“guest”) system hardware 302 as the VM 300A of FIG. 1. Thus, FIG. 2 shows the VM 300B as including the virtual system hardware 302, including the one or more virtual CPUs 304 (VCPU), the virtual system memory 308 (VMEM), the one or more virtual disks 310 (VDISK), and the one or more virtual devices 312 (VDEVICE). Each VM 300B, 300C may also include the guest OS 352, the drivers 354 and the one or more applications 360 (APPS) of the VM 300A of FIG. 1, as shown in FIG. 2 for the VM 300B.
Also as shown in FIG. 2, the virtual computer system 100B includes virtualization software 200B, which includes a VMM 250B that supports the VM 300B and a VMM 250C that supports the VM 300C. The VMMs 250B and 250C may be substantially the same as the virtualization software (VMM) 200A shown in FIG. 1. Thus, FIG. 2 shows the VMM 250B as including one or more device emulators 252B, which may be substantially the same as the device emulators 252A, a memory manager 254B, which may be substantially the same as the memory manager 254A, and an interrupt handler 256B, which may be substantially the same as the interrupt handler 256A.
The device emulators 252B emulate system resources for use within the VM 300B. These device emulators will then typically also handle any necessary conversions between the resources as exported to the VM and the actual physical resources. One advantage of such an arrangement is that the VMM 250B may be set up to expose “generic” devices, which facilitates VM migration and hardware platform-independence. For example, the VMM may be set up with a device emulator 252B that emulates a standard Small Computer System Interface (SCSI) disk, so that the virtual disk 310 appears within the VM 300B to be a standard SCSI disk connected to a standard SCSI adapter, whereas the underlying, actual, physical disk 110 may be something else. In this case, a standard SCSI driver is installed into the guest OS 352 as one of the drivers 354. The device emulator 252B then interfaces with the driver 354 and handles disk operations for the VM 300B. The device emulator 252B then converts the disk operations from the VM 300B to corresponding disk operations for the physical disk 110.
When the computer system 100B of FIG. 2 is booted up, an existing operating system 152, which may be the same as the host OS 152 of FIG. 1, may be at system level and the kernel 202B may not yet even be operational within the system. In such case, one of the functions of the OS 152 may be to make it possible to load the kernel 202B, after which the kernel runs on the native hardware 102 and manages system resources. In effect, the kernel, once loaded, displaces the OS 152. Thus, the kernel 202B may be viewed either as displacing the OS 152 from the system level and taking this place itself, or as residing at a “sub-system level.” When interposed between the OS 152 and the hardware 102, the kernel 202B essentially turns the OS 152 into an “application,” which has access to system resources only when allowed by the kernel 202B. The kernel then schedules the OS 152 as if it were any other component that needs to use system resources.
The OS 152 may also be included to allow applications unrelated to virtualization to run; for example, a system administrator may need such applications to monitor the hardware 102 or to perform other administrative routines. The OS 152 may thus be viewed as a “console” OS (COS). In such implementations, the kernel 202B preferably also includes a remote procedure call (RPC) mechanism to enable communication between, for example, the VMMs 250B, 250C and any applications 160 (APPS), which may be the same as the applications 160 of FIG. 1, installed to run on the COS 152.
In kernel-based systems such as the one illustrated in FIG. 2, there must be some way for the kernel 202B to communicate with the VMMs 250B, 250C. In general, a VMM can call into the kernel 202B but the kernel cannot call directly into the VMM. The conventional technique for overcoming this is for the kernel to post “actions” (requests for the VMM to do something) on an action queue stored in memory 108. As part of the VMM code, the VMM looks at this queue periodically, and always after it returns from a kernel call and also before it resumes a VM. One typical action is the “raise interrupt” action: If the VMM sees this action it will raise an interrupt to the VM in the conventional manner.
As is known, for example, from U.S. Pat. No. 6,397,242 (Devine, et al., 28 May 2002), some virtualization systems allow VM instructions to run directly (in “direct execution”) on the hardware CPU(s) when possible. When necessary, however, VM execution is switched to a technique known as “binary translation,” during which the VM is running in the VMM. In any systems where the VM is running in direct execution when it becomes necessary for the VMM to check actions, the kernel must interrupt the VMM so that it will stop executing VM instructions and check its action queue. This may be done using known programming techniques.
The kernel 202B handles not only the various VMM/VMs, but also any other applications running on the kernel, as well as the COS 152 and even the hardware CPU(s) 104, as entities that can be separately scheduled. In this disclosure, each schedulable entity is referred to as a “world,” which contains a thread of control, an address space, machine memory, and handles to the various device objects that it is accessing. Worlds are stored in a portion of the memory space controlled by the kernel. More specifically, the worlds are controlled by a world manager, represented in FIG. 2 within the kernel 202B as module 206B. Each world also has its own task structure, and usually also a data structure for storing the hardware state currently associated with the respective world.
There will usually be different types of worlds: 1) system worlds, which are used for idle worlds, one per CPU, and a helper world that performs tasks that need to be done asynchronously; 2) a console world, which is a special world that runs in the kernel and is associated with the COS 152; and 3) virtual machine worlds.
Worlds preferably run at the most-privileged level (for example, in a system with an x86 architecture, this will be level CPL0), that is, with full rights to invoke any privileged CPU operations. A VMM, which, along with its VM, constitutes a separate world, therefore may use these privileged instructions to allow it to run its associated VM so that it performs just like a corresponding “real” computer, even with respect to privileged operations.
When the world that is running on a particular CPU is preempted by or yields to another world, then a world switch has to occur. A world switch involves saving the context of the current world and restoring the context of the new world such that the new world can begin executing where it left off the last time that it was running.
The first part of the world switch procedure that is carried out by the kernel is that the current world's state is saved in a data structure that is stored in the kernel's data area. Assuming the common case of an underlying x86 architecture, the state that is saved will typically include: 1) the exception flags register; 2) general purpose registers; 3) segment registers; 4) the instruction pointer (EIP) register; 5) the local descriptor table register; 6) the task register; 7) debug registers; 8) control registers; 9) the interrupt descriptor table register; 10) the global descriptor table register; and 11) the floating point state. Similar state information will need to be saved in systems with other hardware architectures.
After the state of the current world is saved, the state of the new world can be restored. During the process of restoring the new world's state, no exceptions are allowed to take place because, if they did, the state of the new world would be inconsistent upon restoration of the state. The same state that was saved is therefore restored. The last step in the world switch procedure is restoring the new world's code segment and instruction pointer (EIP) registers.
When worlds are initially created, the saved state area for the world is initialized to contain the proper information such that when the system switches to that world, then enough of its state is restored to enable the world to start running. The EIP is therefore set to the address of a special world start function. Thus, when a running world switches to a new world that has never run before, the act of restoring the EIP register will cause the world to begin executing in the world start function.
Switching from and to the COS world requires additional steps, which are described in U.S. patent application Ser. No. 09/877,378, mentioned above. Understanding the details of this process is not necessary for understanding the present invention, however, so further discussion is omitted.
The kernel 202B includes a memory management module 204B that manages all machine memory that is not allocated exclusively to the COS 152. When the kernel 202B is loaded, the information about the maximum amount of memory available on the machine is available to the kernel, as well as information about how much of it is being used by the COS. Part of the machine memory is used for the kernel 202B itself and the rest is used for the virtual machine worlds.
Virtual machine worlds use machine memory for two purposes. First, memory is used to back portions of each world's memory region, that is, to store code, data, stacks, etc. For example, the code and data for the VMM 250B is backed by machine memory allocated by the kernel 202B. Second, memory is used for the guest memory of the virtual machine. The memory management module may include any of a variety of algorithms for dynamically allocating memory among the different VM's 300B, 300C.
Interrupt and exception handling is related to the concept of “worlds” described above. As mentioned above, one aspect of switching worlds is changing various descriptor tables. One of the descriptor tables that is loaded when a new world is to be run is the new world's IDT (Interrupt Descriptor Table). The kernel 202B therefore preferably also includes an interrupt/exception handler 208B that is able to intercept and handle (using a corresponding IDT in the conventional manner) interrupts and exceptions for all devices on the machine. When a VMM world is running, whichever IDT was previously loaded is replaced by the VMM's IDT, such that the VMM will handle all interrupts and exceptions.
The VMM will handle some interrupts and exceptions completely on its own. For other interrupts/exceptions, it will be either necessary or at least more efficient for the VMM to call the kernel to have the kernel either handle the interrupts/exceptions itself, or to forward them to some other sub-system such as the COS. One example of an interrupt that the VMM can handle completely on its own, with no call to the kernel, is a check-action IPI (Inter-Processor Interrupt). One example of when the VMM preferably calls the kernel, which then forwards an interrupt to the COS, would be where the interrupt involves devices such as a mouse, which is typically controlled by the COS. The VMM may forward still other interrupts to the corresponding VM.
In some embodiments of the invention, the kernel 202B is responsible for providing access to all devices on the physical machine. In addition to other modules that the designer may choose to load onto the system for access by the kernel, the kernel will therefore typically load conventional drivers as needed to control access to devices. Accordingly, FIG. 2 shows a module 210B containing loadable kernel modules and drivers. The kernel 202B may interface with the loadable modules and drivers in a conventional manner, using an API or similar interface.
Universal Serial Bus
The invention may be implemented in either the hosted virtual computer system of FIG. 1 or the kernel-based virtual computer system of FIG. 2, along with other computer systems, to virtualize a USB interface. The invention may also be used to virtualize other interfaces, but this description is primarily limited to virtualizing a USB interface for simplicity. Thus, the device(s) 112 in FIGS. 1 and 2 may include a USB host controller for providing a USB interface. USB interfaces are described in detail in a book entitled “Universal Serial Bus System Architecture, Second Edition,” published by Mindshare, Inc. in 2001, and written by Don Anderson and Dave Dzatko. This book is incorporated here by reference, and may be referred to as “the USB System Architecture book” below. A pending U.S. patent application, which is owned by the assignee of this patent, also relates to USB interfaces, along with other interfaces. That pending application is U.S. patent application Ser. No. 10/080,782 (“High-Speed Packet Transfer In Computer Systems With Multiple Interfaces”), which is also incorporated here by reference.
The USB host controller in the device(s) 112 may implement an Open Host Controller Interface (OHCI), a Universal Host Controller Interface (UHCI), an Enhanced Host Controller Interface (EHCI), or any yet to be developed host controller interface. Any such host controller interface is referred to herein as an XHCI interface, for generality, so that any reference to XHCI herein can generally be interpreted as a reference to OHCI, to UHCI, to EHCI and/or to any such yet to be developed host controller interface. The official specifications for the OHCI, the UHCI and the EHCI may also be referenced for more detailed information about USB interfaces. The host OS 152 of FIG. 1 would include one or more drivers 154, and the kernel 202B of FIG. 2 would include one or more drivers 210B, for interfacing with the XHCI host controller in a conventional manner.
The virtualization software 200A of FIG. 1 may include a device emulator 252A that emulates a USB interface for the VM 300A, while the virtualization software 200B of FIG. 2 may include a device emulator 252B that emulates a USB interface for the VM 300B. Thus, each of these USB device emulators may emulate a virtual device 312 for its respective VM, where these virtual devices 312 may be virtual XHCI host controllers. Such a virtual XHCI host controller may implement the same host controller interface as a corresponding physical XHCI host controller, or it may implement a different host controller interface.
So the virtual system hardware 302 in the VM 300A of FIG. 1 or in the VM 300B of FIG. 2 includes a virtual XHCI host controller, which may be used by the guest software in the respective VM for communicating with other devices on a virtual USB bus. In each VM, the guest OS 352 may include one or more conventional drivers 354 for interfacing with the virtual XHCI host controller, in a conventional manner. From the perspective of the guest software in the VMs 300A and 300B, or from the perspective of a user of the VMs, the USB interface in the respective VM may generally appear to be a regular, physical USB interface.
When guest software within the VMs 300A and 300B attempts to use the virtual USB interface to communicate with devices that appear to be connected to the virtual USB bus, the guest software provides input/output (I/O) requests to the virtual XHCI controller. However, the I/O requests must generally be communicated in some form to the physical XHCI controller within the system hardware 102 for conveyance across the physical USB bus. Also, responses to these I/O requests must generally be communicated from the physical XHCI controller to the virtual XHCI controller. The XHCI controller emulator may assume primary responsibility for communicating I/O requests and responses between the virtual XHCI controller and the physical XHCI controller. This invention may be implemented within an XHCI controller emulator, for example, to enable this communication of I/O requests and responses between a virtual XHCI controller and a physical XHCI controller. A method and apparatus for providing this communication is described in greater detail below.