Computers include general purpose central processing units (CPUs) that are designed to execute a specific set of system instructions. A group of processors that have similar architecture or design specifications may be considered to be members of the same processor family. Examples of current processor families include the Motorola 680X0 processor family, manufactured by Motorola, Inc. of Phoenix, Ariz.; the Intel 80X86 processor family, manufactured by Intel Corporation of Sunnyvale, Calif.; and the PowerPC processor family, which is manufactured by Motorola, Inc. and used in computers manufactured by Apple Computer, Inc. of Cupertino, Calif. Although a group of processors may be in the same family because of their similar architecture and design considerations, processors may vary widely within a family according to their clock speed and other performance parameters.
Each family of microprocessors executes instructions that are unique to the processor family. The collective set of instructions that a processor or family of processors can execute is known as the processor's instruction set. As an example, the instruction set used by the Intel 80X86 processor family is incompatible with the instruction set used by the PowerPC processor family. The Intel 80X86 instruction set is based on the Complex Instruction Set Computer (CISC) format. The Motorola PowerPC instruction set is based on the Reduced Instruction Set Computer (RISC) format. CISC processors use a large number of instructions, some of which can perform rather complicated functions, but which require generally many clock cycles to execute. RISC processors use a smaller number of available instructions to perform a simpler set of functions that are executed at a much higher rate.
The uniqueness of the processor family among computer systems also typically results in incompatibility among the other elements of hardware architecture of the computer systems. A computer system manufactured with a processor from the Intel 80X86 processor family will have a hardware architecture that is different from the hardware architecture of a computer system manufactured with a processor from the PowerPC processor family. Because of the uniqueness of the processor instruction set and a computer system's hardware architecture, application software programs are typically written to run on a particular computer system running a particular operating system.
Virtual Machines
Computer manufacturers want to maximize their market share by having more rather than fewer applications run on the microprocessor family associated with the computer manufacturers' product line. To expand the number of operating systems and application programs that can run on a computer system, a field of technology has developed in which a given computer having one type of CPU, called a host, will include an emulator program that allows the host computer to emulate the instructions of an unrelated type of CPU, called a guest. Thus, the host computer will execute an application that will cause one or more host instructions to be called in response to a given guest instruction. Thus the host computer can both run software designed for its own hardware architecture and software written for computers having an unrelated hardware architecture. As a more specific example, a computer system manufactured by Apple Computer, for example, may run operating systems and program written for PC-based computer systems. It may also be possible to use an emulator program to operate concurrently on a single CPU multiple incompatible operating systems. In this arrangement, although each operating system is incompatible with the other, an emulator program can host one of the two operating systems, allowing the otherwise incompatible operating systems to run concurrently on the same computer system.
When a guest computer system is emulated on a host computer system, the guest computer system is said to be a “virtual machine” as the guest computer system only exists in the host computer system as a pure software representation of the operation of one specific hardware architecture. The terms emulator, virtual machine, and processor emulation are sometimes used interchangeably to denote the ability to mimic or emulate the hardware architecture of an entire computer system. As an example, the Virtual PC software created by Connectix Corporation of San Mateo, Calif. emulates an entire computer that includes an Intel 80X86 Pentium processor and various motherboard components and cards. The operation of these components is emulated in the virtual machine that is being run on the host machine. An emulator program executing on the operating system software and hardware architecture of the host computer, such as a computer system having a PowerPC processor, mimics the operation of the entire guest computer system.
The emulator program acts as the interchange between the hardware architecture of the host machine and the instructions transmitted by the software running within the emulated environment. This emulator program may be a host operating system (HOS), which is an operating system running directly on the physical computer hardware. Alternately, the emulated environment might also be a virtual machine monitor (VMM) which is a software layer that runs directly above the hardware and which virtualizes all the resources of the machine by exposing interfaces that are the same as the hardware the VMM is virtualizing (which enables the VMM to go unnoticed by operating system layers running above it). A host operating system and a VMM may run side-by-side on the same physical hardware.
Disaster Recovery Systems
Because of the risk of potential “disaster” events—for example, power failures, natural disasters, hardware failures, and so on and so forth—enterprise IT companies and IT departments within larger enterprises (hereinafter simply “enterprises”) are rightfully concerned about business continuity in the face of such a disaster. Such enterprises want to minimize downtime resulting from these disasters that may occur on computer systems in order to decrease the costs of such disasters where, for certain businesses, computer downtime can cost millions of dollars each minute. Therefore, when disasters and downtime do occur, it is important for the enterprise to ensure that computer systems will be back up and running as quickly as possible with as little disruption to users or customers as possible. Thus, in this regard, there are strong financial incentives for corporations to upgrade their disaster recovery systems in order to minimize computer system downtime.
It is understood in enterprise IT that when computers fail due to any type of disaster, the data (e.g., orders, customer information, document files) that is typically stored in persistent memory (e.g., hard disks) is the most critical information to restore, and several methods of data backup—including data mirroring, tape backup solutions, redundant disk arrays, and the like—are well known in the art. However, in order to more fully recover from a computer failure, it is also critically important to store and recover the “state” of the computer system as that state exists at a time immediately prior to a failure event. Storing and recovering the state of the computer allows the enterprise to provide a more complete and minimally disruptive restoration. In this regard, and in order for a state recovery system to effectively restore the state of the computer, it is essential that information regarding the applications running on the computer be stored as well as the state of its processor and devices.
However, from a cost perspective—that is, the relative costs of utilizing a state recovery system versus the incurred costs of downtime stemming from a disaster—an enterprise will naturally choose the lower-cost option and, as part of this analysis, the cost of a state recovery system must include the cost for upgrades. For example, if the organization needs to upgrade its state recovery system, it is preferable to do so in a low-cost manner, such as by adding a new module to an existing state recovery system rather than replacing the entire existing system. If the cost of system upgrades is too high, then the cost of a state recovery system may also be too high and force an enterprise to instead elect to bear the lesser burden of costs for downtime from disasters.
Unfortunately, current methods utilized to back up and restore the state of computers are cost prohibitive, while inexpensive solutions that provide minimally intrusive backup and restoration of the state of a computer currently do not exist. For example, one specific solution developed by Marathon Technologies is to perform lock-step state comparison for each I/O request between two computers (one in production and one as a backup copy) across a data link; however, this type of solution requires expensive fiber-optic connectivity between the computers and thus is cost-prohibitive for all but the largest corporations with the deepest pockets. Another solution is to provide a fault-tolerant/fault-resilient computer system that includes at least two computing elements (CEs) connected to at least one controller where one secondary CE functions as a backup to another primary CE and replaces the primary CE without disruption to users if the primary CE fails; however, this solution also requires an expensive fiber-optic communications link and thus is very expensive to implement. In contrast, existing methods of storing and recovering the state of a computer using communication links having lower bandwidth than fiber optics are undesirable because they are disruptive to the user as these methods generally require that the computer be stopped for the amount of time that it takes to transfer the data from the processor and the devices across the communications link for durations that range from a few seconds to several minutes and which can result in loss of productivity as well as disruptive events such as dropped network connections and a lack of service continuity for the users of the computer.
Therefore, there is a need in the art is a low-cost means for storing and recovering the state of a computer system that minimizes disruptions to end-users of said computer system.