1.1 Cross References to Related Applications
This subject matter of this invention is generally related to U.S. Pat. No. 5,319,760, by A. Mason, J. Hall, R. Witek, & P. Robinson, filed Jun. 28, 1991 and issued on Jun. 7, 1994, and entitled "TRANSLATION BUFFER FOR VIRTUAL MACHINES WITH ADDRESS SPACE MATCH." Both the present invention and the subject matter claimed in the cross-referenced patent were subject to an obligation of assignment to Digital Equipment Corporation at the time the present invention was made.
1.2 Field of the Invention
This invention relates generally to the field of digital data processing systems or computer systems. More specifically, this invention relates to computer systems having multiple hierarchical protection rings (or protection rings) for regulating the access to memory locations and the executability of certain instructions. Even more specifically, the present invention relates to those computer systems having multiple protection rings that utilize a virtual machine monitor to provide one or more virtual machines.
1.3 Operating Systems
Modern computer hardware is quite powerful and can be used to accomplish many tasks. This bare hardware, however, often provides a complicated and difficult-to-use interface for the individuals or machines that seek to utilize its computing power. To increase the usability of these bare computers, "operating systems" (sometimes "OS") have been developed. These operating systems manage the basic hardware resources and provide a more simplistic and easy-to-use interface to users of the basic computing hardware than the interface provided by the bare machine.
Basically, the operating system acts as an interface between the user, or the user's programs, and the bare computer hardware. The general operation of an operating system may be described as follows: A "user" submits his "job" to the operating system for execution. A user is anyone that desires the computer hardware to do work for him or her; a job is the collection of activities needed to do the desired work. Once the operating system accepts a user's job, it may create several "processes" to enable the hardware to perform the requested activities. Generally, a process (sometimes referred to as a "task") is a computation that may be done concurrently with other computations. Associated with each process is an "address space". The address space for each process includes the collection of stored programs and data that are accessed in memory by that process. A further discussion of address spaces is contained in the following section.
FIG. 1 illustrates the basic concept of an operating system. The bare computing hardware (or "bare machine") is illustrated by the innermost circle 10; and the operating system is illustrated by the outermost concentric circle 12.
Without the operating system, life would be difficult for computer users. For example, if a user wanted to copy information from a computer hardware resource (e.g., a disk drive) to another resource (e.g., main memory), he may have to provide literally thousands or tens-of-thousands of instructions to the bare machine 10 to perform such a task. When an operating system exists, however, the user's work is greatly reduced. Here, the user must merely issue a single instruction to the operating system 12; control is transferred to the operating system 12; and the task is performed. As the above example illustrates, an operating system allows the user to interact with the hardware, not only through the limited instructions available to the bare machine, but also through additional instructions that enable the operating system to perform complex or tedious tasks. The sum of these instructions is known as the "extended machine".
In FIG. 1, the extended machine 14 is illustrated as the area enclosed by the circle (or ring) representing the operating system. The users 16, or user programs, interface with the bare machine 10 through the extended machine 14. Thus, it may be said that the operating system 12 runs on the bare machine 10; and the user programs 16 run on the extended machine 14.
1.4 Address Spaces
FIGS. 2A-2C generally illustrate the concept of "address spaces" as used in computer systems having one operating system and one or more processes.
Most computer systems have a certain, limited amount of memory defined by the range of possible memory addresses the computer can generate. This range of possible addresses defines the "physical address space"--so called because it is the range of addresses that the physical memory boards will respond to. Thus, in FIG. 2A the physical memory is generally illustrated as 20. Physical memory 20 generally comprises several physical address locations 21 and memory control hardware 22. The memory control hardware 22 is capable of receiving an address and either writing information to that address or reading information stored at that address.
The physical address space of the computer is seldom directly used. To enhance the operation of the computer system "virtual address spaces" are generally used to implement "virtual memory." The use of virtual memory is essentially invisible to most processes and programs running on the computer. From the process's perspective an address is generated and a memory location is referenced. When virtual memory is implemented, however, the physical address of the memory referenced is usually not the same as the virtual address generated by the process. The use and implementation of virtual memory is known in the art and is only generally discussed herein.
FIG. 2B illustrates one implementation of virtual memory. As illustrated, an address translation unit (ATU) 24 is between the processes/programs 25 and the memory hardware 20. The address translation unit may utilize hardware, software, and firmware for converting virtual addresses to physical addresses. Further, the ATU is generally not an isolated piece of hardware in the computer system. It is illustrated as such here, however, to emphasize the concept that the virtual addresses generated by computing processes are translated to real addresses before physical memory is addressed. Methods for implementing address translation are generally known in the art and will not be discussed in detail herein. Additional discussion of address translation may be found in the co-pending application "TRANSLATION BUFFER FOR VIRTUAL MACHINES WITH ADDRESS SPACE MATCH."
In prior art systems, the ATU 24 can utilize several address maps that are under the sole control of the operating system. Basically, an address map is a listing of physical and virtual addresses, indexed by the virtual address. Thus, whenever a virtual address and an address map are presented to the ATU 24, the ATU 24 may find the proper physical address by looking up the virtual address in the appropriate address map.
In most cases, when virtual memory is used, each process has its own address space; i.e., each process is associated with a distinct virtual-to-physical address map. Further, when the processor is executing instructions from a certain process it is able to determine which address map is assigned to that process. Thus, if a first process wants to share certain memory locations with a second process, it must generally arrange with the operating system (which controls the address maps) that parts of the first process's address map will point to the same physical memory as pans of the second process's address map. The operating system must be involved if such sharing of memory is to be implemented since it controls the address maps.
In many prior art systems, processes and the operating system share pans of the same address spaces; i.e., the process and the operating system may use different parts of the same address maps. Such a system is illustrated graphically in FIG. 2C. FIG. 2C illustrates a first address space 26 (i.e., an address map) wherein the same address map is used to translate virtual addresses generated by the process and the operating system. A second address space 27 is illustrated where a second process and the operating system share the second address map. In such "address space sharing" systems, each process is generally associated with a separate address map, and the portions of those address maps corresponding to the virtual memory locations assigned to the operating system are usually identical.
In the address space sharing systems, because the virtual-to-physical translation mechanism is the same for a given virtual address regardless of whether it was generated by a user process or the operating system, absent some sort of protection mechanisms in such a system, there is nothing to prevent the process from generating virtual addresses identical to those generated by the operating system and, via the address map, accessing the physical memory location that is supposed to be reserved for the operating system. As discussed in the next section, however, in most such systems, a protection-ring system is used to prevent unauthorized processes from accessing memory locations associated with the operating system. In such systems, the entries in the address map contain additional information, e.g., the ring number, that is used by the system to provide secure operation. Thus, even if a user process generated a virtual address corresponding to a physical memory location reserved to the OS, the ring number associated with that virtual address would prevent the process from accessing that location since in most such systems the process, hardware, or firmware interpreting the address map will allow access for that location only to processes running in the most privileged ring and only the OS is allowed to run in that ring. Thus, in most prior art systems where address spaces are shared by the OS and the other processes, ring-protection is essential to ensure secure operation.
1.5 The Hierarchical Machine Concept
Because several processes may exist at one time, the potential exists that one process may access and modify data or programs stored in the address space of another process. Further, the potential exists that one process will access and modify portions of the address space associated with the operating system. For example, many computer systems include system resources such as compilers and interpreters for converting programs written in high-level language into machine code which may be executed by the processor. It is generally undesirable to allow a user program to access directly the memory locations allocated to the compilers or interpreters.
In addition to it being undesirable for all processes to have access to all memory locations, it is also undesirable to have all processes capable of using certain "privileged" instructions. For example, the HALT instruction causes the processor to halt processing. Obviously, it would be undesirable to allow any user process to unilaterally halt all processing on the computer system.
To avoid the problems discussed above, many computer systems are provided with a hierarchy of "protection rings" (sometimes referred to as protection levels or protection modes). The protection rings shield programs which control system resources from other programs, e.g., user programs, and allow access to the shielded programs only in a controlled manner. For example, one computer system (the VAX-11 family of systems sold by the assignee of the present invention) has four protection rings referred to as the kernel, the executive, the supervisor, and the user (or ring 0, ring 1, ring 2, ring 3). The computer processor may be executing instructions in any of these four rings depending on the particular process making use of it. In the VAX-11 system, input/output functions and transfers to and from memory are performed only if the processor is executing in the kernel (ring 0) and the kernel is the only ring from which privileged instructions can be executed. Less protected system resources, such as programs controlling the file system run in executive mode (ring 1); while other programs, such as the interpreter for commands typed in by a user runs in supervisor mode (ring 2). Compilers and basic applications programs run in user mode (ring 3).
Strictly speaking, it is the computer processor that operates in one of the many protection rings. For ease of discussion, however, certain processes are referred to herein as residing in a certain protection ring or running in a certain mode. What this means is that the processor will be in a certain mode when it is executing instructions necessary for a particular process. For example, if there is a certain process X, and the processor will be in ring 2 when it is executing instructions necessary to complete process X, then process X may be said to reside in the ring 2 or to be running in that ring.
The above discussion is made in terms of a single "processor." It should be noted, however, that certain computer systems have "processors" that comprise more than one processing unit. The present invention includes such multi-processor computer systems and when reference is made in the description to a "processor" it includes multi-processor systems.
In most systems, the protection rings are utilized in two ways: (1) for controlling the execution of privileged instructions; and (2) for controlling the access to certain memory locations.
If the processor receives a privileged instruction, it must first determine if it is in the proper mode to execute that instruction. If it is, the processor executes the instruction; if not, it "traps" the instruction; i.e., it transfers control to an exception routine or other process for handling the instruction.
In the VAX-11 system, as in many other systems, privileged instructions can only be executed when the processor is running in the most privileged or kernel mode (ring 0). In those systems, if a processor in any other mode receives a privileged instruction it will transfer control to a process in the kernel (ring 0). Thus, a privileged instruction is defined for purposes of this disclosure as an instruction that "traps" to the most privileged mode when executed by any but the most privileged mode.
The determination of which instructions should be privileged instructions depends on the specific computer hardware/software used and the security requirements of the system. Methods for selecting such instructions are generally known in the art and will not be discussed in detail.
In addition to having "privileged" instructions that trap to the most protected mode when they are submitted by processes residing in other than that mode, it may be desirable to have multiple levels of protected instructions that trap to a mode of greater protection if submitted by a less protected mode; e.g., certain instructions that trap to a process in the executive mode (ring 1) when submitted by any process not in the executive or kernel mode. Such protected instructions will operate in a manner similar to the privileged instructions, except that they trap to some mode other than the most protected mode.
In addition to controlling the executability of privileged instructions, protection modes are also used to control memory access. In most systems that use protection modes, each location in memory is accessible only when the processor is executing in one of a predetermined set of operating modes. Further, each memory location may be accessible only in a particular way. For example, a memory location may be read by programs in a particular operating mode but not written, or written by programs operating in any mode more privileged than a particular mode. As a further example, when the processor is running programs in a mode other than the most privileged mode (ring 0) it may not be able to access portions of memory reserved to programs which operate in ring 0.
FIG. 3 illustrates one system having four available protection rings. In practice a multiple protection ring system may have greater or fewer rings. As noted in FIG. 3, the rings are concentric with the most protected mode 30, kernel mode (ring 0), being in the center and the executive mode 32 (ring 1), supervisor mode 34 (ring 2), and user mode 36 (ring 3) proceeding from the center. As noted above, only when the processor is executing in the most privileged mode (ring 0) 30 can privileged instructions be executed; all other attempts result in a trap to processes in the kernel mode (ring 0) 30.
Further, in most instances memory access is controlled by the protection mode. For example, in one embodiment, if a certain memory location is accessible to processes in the supervisor mode (ring 2) 34, it is also accessible to processes in the executive mode (ring 1) 32 and the kernel mode (ring 0) 30 (since those rings are within the supervisor ring)--but not in the user mode (ring 3) 36. In this embodiment, memory locations that are accessible when the processor is in the user mode (ring 3) are generally accessible to processes in all modes. Alternate embodiments are envisioned wherein memory access is specified for each protection mode individually, e.g., a memory location accessible to the supervisor mode may be inaccessible to the executive mode. The present invention is not dependent on the protection model within a specified address space as long as distinct address spaces are inaccessible to each other.
Specific operating systems have developed to take advantage of the protection features offered by multiple protection rings. For example, many operating systems have been developed that run solely in the most protected mode (ring 0). Thus, any process that attempts to execute a privileged instruction will be trapped to the operating system's control and only the operating system can access memory locations addressable by the most protected mode. Other computer processes, other than operating systems, have been designed to take full advantage of the security offered by multiple protection rings. Such processes often have sub-processes that operate in different protection rings in order to optimize the efficiency and security of the overall process. For such specially designed processes it is essential that all of the protection rings available on the computer be available to the process.
1.6 Virtual Machines and the Virtual Machine Monitor
Many computers today operate on a "timesharing" basis where a number of users are allowed to run a number of different computer applications on a single system. This is generally accomplished by having the operating system: (1) keep track of the data and instructions from each user; (2) schedule the running of the application programs on a rotating basis; and (3) transmit the processed data to the users when the processing is complete.
One problem with most timesharing systems is that they generally use a single operating system on which all of the application programs run. Although certain applications run better on certain operating systems than others, the use of a single operating system does not permit the selection of others. Another problem with such systems is that if a given software system (e.g. a database manager) is to be revised or replaced, the revision and testing cannot generally take place concurrently with the use of the original system, resulting in decreased availability.
To enable users to select among various operating systems, and to permit increased availability during times of revision of various software systems, it would be convenient to have a second computer system. Thus a user could select an alternate operating system, or a software system could be revised and tested, without impacting the original computer system. However, purchasing and maintaining this second computer system may add significant expense and overhead for the owner of the system.
One way to provide the effect of a second computer system, without actually purchasing and maintaining one, is to use a "virtual machine" (VM) system. In such a system, a "virtual machine monitor" (VMM) multiplexes the resources of a single computer system so as to provide the effect of multiple independent computer systems. Each virtual machine provides its own computing environment, similar to the environment provided by a traditional operating system in a non-VM system.
One common method of achieving this is for the VMM to provide VMs which appear to be duplicates of the machine the VMM runs on. One set of requirements for providing such virtual machines may be found in Popek and Goldberg, Formal Requirements for Virtualizable Third Generation Architectures, COMMUNICATIONS OF THE ACM (July 1974). Popek and Goldberg define a virtual machine as an isolated, efficient duplicate of the underlying real machine. One advantage of providing this type of virtual machine is that a traditional operating system which rum on the real machine can easily be made to run on the virtual machine. This is the type of virtual machine considered by the present invention.
The VMM may be considered to be the real machine's operating system; it runs on the bare machine, and provides multiple independent interfaces that essentially duplicate the interface provided by the bare machine. Unlike traditional operating systems, this interface is intended for direct use only by other programs (e.g. operating systems), not by users. These other programs may provide an interface for the users.
Because the VMs appear as complete systems, they are often used to support different operating systems. Thus, while in a traditional computing system there would be only one machine with one operating system, when a VMM is used there can be multiple virtual machines running distinct operating systems. As with an OS running on a bare machine, each OS running on a VM may support several users and several user programs.
FIGS. 4A-4B illustrate one such VMM system 45. As FIG. 4A illustrates, the basic structure consists of the bare machine 40, including a central processor 42, a memory 43, and numerous system resources 44 on which a virtual machine monitor 46 is running. Multiple users have access to the computer system via interface devices such as computer terminals 48.
FIG. 4B illustrates the situation as it appears to the users when the VMM 46 is in operation. As noted there are three virtual machines, 40a, 40b, 40c, each associated with a central processor 42a, 42b, 42c, and a memory 43a, 43b, 43c. Each VM is also associated with several computer resources 44a, 44b, 44c. A separate operating system (OSa, OSb OSc) is running on each of the VMs. Thus, each user operating on his terminal 38a, 38b, 38c, will interface with his appropriate OS on a VM as if he were interfacing with that OS running on a bare machine. Additional background discussion concerning virtual machines, VMMs, and operating systems may be found in S. Madnick & J. Donovan, OPERATING SYSTEMS (McGraw-Hill 1974). Virtual machines are also described by Siewiorek et al. in COMPUTER STRUCTURES: PRINCIPLES AND EXAMPLES at 227-28 (McGraw-Hill 1982).
1.7 Virtual Machines and Protection Modes-"Ring Compression"
The use of virtual machines and VMMs created significant problems that did not exist on bare machine, single OS systems. For example, on a bare machine, single OS system the multiple protection layers discussed above can be used to control and regulate access to memory locations and the execution of privileged instructions. Such is not necessarily the case when a VMM is used.
When VMs are used there must be some mechanism for preventing the virtual machines from affecting (or being affected by) the execution of any other VM or the VMM itself. In the prior art the most common mechanism used to achieve such objectives was known as "ring-compression."
In ring-compression, the VMM is generally designed such that it operates only in the most privileged mode. In the VAX system, this means the VMM operates in the kernel mode (ring 0). All other processes share and utilize the other protection modes.
Because the most privileged mode is reserved to the VMM, the VMs are prevented from running in the most privileged mode (ring 0). This creates several problems.
First, the fact that the VMs cannot run in the most protected mode is in conflict with the previously stated objective of the VM: to provide an interface that is as close as possible to the bare machine. Second, since the VM is precluded from running in the most protected mode, the VMM must include some mechanism for making it appear to the user (or the OS running on the VM) that the VM is running in that mode. This requirement (that the VMM be able to make the VM appear to be running in the most protected mode) often causes the VMM to include extensive and complex processes.
A further difficulty created by ring-compression is that the VMM cannot readily take full advantage of the multiple protection rings. In a process where the VM may run, the VM has the use of all but the most privileged ring; in particular, the VM may access any memory protected so that it is accessible to any ring other than the most privileged ring. If the VMM were to use one of these "outer rings" for its own purposes, to the extent that the VMM did so it would not be protected from the VM. Therefore the VMM may not use these rings in a process which is also used by a VM. (The VMM may choose to create a process which is not used by any VM; within the context of that process, the VMM is free to use the outer rings as it sees fit. This use of processes is known in the art and is not discussed further.)
FIG. 5 illustrates one way in which ring compression may be accomplished in the VAX system where there are four protection rings. As may be seen, the virtual user 52v (ring 3v) and supervisor 54v (ring 2v) modes are mapped to the real user and supervisor modes 52r, 54r (rings 3r, 2r). Thus, processes running on a VM (or the VM itself) that execute in the user or supervisor mode will actually be operating in those real modes.
The VM executive mode 56v (ring 1v) is mapped to the real executive mode 56r (ring 1r). As with the user and supervisor modes, for this mode the real processor will be in the same protection mode when executing this instruction as the virtual processor.
Because the real kernel mode 58r (ring 0r) is assigned exclusively to the VMM, the virtual kernel mode 58v (ring 0v) is mapped to the real executive mode 56r (ring 1r). Thus, instructions that the software in the VM believes are executing on a virtual processor in the kernel mode are in fact being executed by the real processor in the executive mode.
The operation of a system using ring-compression is as follows: as noted above, there are certain instructions, privileged instructions that trap to a process in the most protected mode. Another group of instructions, "sensitive instructions" are those instructions which read or change the privileged machine state. Thus, if a VM is allowed to execute a sensitive instruction, it can change the machine state in a manner that would affect all other VMs and the VMM.
The general concept of ring-compression is based on the notion that all sensitive instructions are also privileged instructions. If all sensitive instructions are privileged and the VMM through ring-compression prevents the VMs from executing in the most privileged mode, then all sensitive instructions executed by the VM will trap to the most privileged mode (i.e., where the VMM is) and the VMM can emulate the effect of the sensitive instruction. In some computer systems, not all of the sensitive instructions are privileged. In that case, the VMM must arrange to trap or otherwise emulate all the sensitive instructions executed by the VM so that their effects are confined to the executing VM. A further discussion of ring-compression and sensitive unprivileged instructions may be found in U.S. Pat. No. 4,787,031 to Karger et al., entitled "COMPUTER WITH VIRTUAL MACHINE MODE AND MULTIPLE PROTECTION RINGS" and J. Hall & P. Robinson, Virtualizing the VAX Architecture in PROCEEDINGS OF THE 18TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ACM Press 1991).
While ring-compression schemes have allowed a VMM to support several VMs, they have several drawbacks:
First, in a process where a VM may run, the VMM is constrained to the most privileged ring and cannot take full advantage of the multiple protection rings available on many systems;
Second, the VMs are constrained to less than the full number of available real modes. This poses difficulties because the software that generally runs on a VM is designed to utilize all of the available real protection modes;
Third, because the VMM must essentially make the VMs believe they are running in the real ring 0, it is often quite complex and difficult to design and modify;
Fourth, because the VMs cannot execute in real ring 0, all memory locations that the VM ring 0 processes must address must be made available to processes operating in the real ring 1 (in which the VM ring 0 is emulated), this results in a reduction of security because processes operating in the virtual ring 1 (real ring 1) can now access processes operating in virtual ring 0 (also real ring 1).