The present invention is directed generally to computer systems, and more particularly to recovering from the failure of a computer system.
Computer systems are protected against failure by backing up the computer data, whereby if a system crashes, the data may be restored. However, if a computer system fails in a manner in which it cannot reboot, the data cannot be simply restored. For example, hardware may fail, (e.g., the hard disk controller burns out), or software may fail, (e.g., a virus corrupts some key files and/or data), in a manner that prevents a reboot. However, in the event of a system failure, computer users not only want their data restored, but want their system restored to the way it was prior to the failure.
At present, backing up system information so as to enable a system to be restored to a bootable state, involves the use of many disjoint and separate programs and operations. For example, a system administrator may use one or more utility programs to determine the state of disk configuration and/or formats so that the disk information may be saved. Additional programs and techniques may be used to record a list of operating system files, data, and other software installed on the system. The administrator may also record the types of various devices and settings thereof installed in a system. Backing up a system""s state is thus a formidable task.
Similarly, the process of restoring a system involves the use of this recorded information, along with an operating system setup program, thus making restoration a complicated process. Moreover, if the original system is replaced with non-identical hardware, (e.g., a larger disk, a new CD-ROM, Hard Disk Controller, and/or Video Card) then additional complications may arise because much of the saved state information may no longer apply to the new system configuration. For example, if a system fails and the data and files are restored to a non-identical system, many hours may have to be spent adjusting and configuring the system to work, using a variety of different programs and utilities. In sum, present system recovery (backup and restore) involves proprietary and custom crafted solutions that are not common and extensible. Instead, providers of backup and restore programs each redefine an environment, process, and syntax to enable the recovery of the system.
As a result, whenever a failure makes a system non-bootable, the process to reconstruct the system""s previous state is error prone and lengthy. This can cause serious problems, particularly with computer systems used in critical roles (such as a file server) wherein the time required to get the computer system operational after a failure is very important.
Briefly, the present invention provides a method and system that enables the backup and restoration of a failed system in an automatic and efficient manner. A backup component copies and stores the state that intrinsically defines the configuration of the computer system for potential and future recovery by obtaining and preserving the underlying description of the system, separate from the actual operating system and data files. For example, the backed-up state information includes the disk structure and layout, such as number of disk partitions, how the partitions are arranged on the disk, how the disk partitions are formatted, and the location where the operating system (e.g., Microsoft(copyright) Corporation""s Windows(copyright) 2000 operating system) is installed on the disk. This information is recorded on a medium that will be available to an operational, but not yet restored system, such as one or more floppy disks or a writeable CD-ROM. Also backed up is the information specifying what should be executed during the restore phases, including the programs to copy and execute, any error handling, and any special driver files to load, such as a driver needed cooperate a backup device (e.g., a tape drive). Files on the computer system (e.g., the operating system files) and their associated properties are typically recorded to the backup device.
A restore component is also provided, and may operate in two distinct phases. In a first phase, Automated System Recovery (ASR) is started, typically via a Windows(copyright) operating system CD into which ASR has been integrated. The CD loads the necessary drivers and information needed to view and access critical parts of the computer such as the hard disks, CD-ROM, and/or a floppy disk drive. When ASR is selected and run, a prompt for the floppy disk or other medium containing the information saved during the backup phase is provided. ASR scans the disk partitions and volumes, and uses the backed up configuration information to compare with the current state of the disk partitions and volumes on the system. The disk and volume state are restored according to the saved information. If the disks, volumes and/or hardware existing on the system are not identical to those originally present when the backup was made, the volume and disk information is adjusted and restored to the best possible extent and/or the new hardware merged or preserved.
Once the underlying system state is restored, an environment is created so that the operating system data and other files may be restored. To do this, a restore environment is created by copying a set of files required to run the programs that will restore the remainder of the data. Once these files are copied, the system is restarted.
A second restore phase follows the first restore phase to complete the restoration of the computer system. In this phase, ASR configures the environment for launching a restore program (or programs), by detecting and installing drivers and support for devices installed on the system via the operating system CD. This ensures that any necessary devices (such as a tape drive) are available for the restore program. The restore program or programs are then run according to the instructions that were saved therewith during the backup phase, which usually results in restoring the remainder of the data and other information saved during the backup phase. The system is restarted, and the restoration and recovery is complete. In the event of an error condition, the specified instructions will be executed.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: