1. Technical Field
The present invention relates to system services technology. More particularly, the present invention relates to system fault recovery. Still more particularly, the present invention relates to a system and apparatus for recovering from a catastrophic system failure.
2. Description of Related Art
The UNIX operating system, or xe2x80x9cUNIX,xe2x80x9d xe2x80x9cA weak pun on Multics,xe2x80x9d is an interactive time-sharing operating system invented in 1969 by Ken Thompson after he left Bell Labs and the Multics project., apparently to play games on his scavenged PDP-7 computer (minicomputer sold by Digital Electric Corp. (DEC), (Compaq Computer Corp., 20555 SH 249, Houston, Tex. 77070)). Thompson developed a new programming language xe2x80x98Bxe2x80x99, and Dennis Ritchie enhanced xe2x80x98Bxe2x80x99 to xe2x80x98Cxe2x80x99 and helped develop xe2x80x98UNIXxe2x80x99.
The UNIX operating system is a multi-user operating system supporting serial or network connected terminals for more than one user. It supports multi-tasking and a hierarchical directory structure for the organization and maintenance of files. UNIX is portable, requiring only the kernel ( less than 10%) written in assembler, and supports a wide range of support tools including development, debuggers, and compilers.
The UNIX operating system consists of the kernel, shell, and utilities. The kernel schedules tasks, manages data/file access and storage, enforces security mechanisms, and performs all hardware access. The shell presents each user with a prompt, interprets commands typed by a user, executes user commands, and supports a custom environment for each user. Finally, the utilities provide file management (rm, cat, ls, rmdir, mkdir), user management (passwd, chmod, chgrp), process management (kill, ps), and printing (lp, troff, pr).
A multi-user operating system allows more than one user to share the same computer system at the same time. It does this by time-slicing the computer processor at regular intervals between the various people using the system. Each user gets a set percentage of some amount of time for instruction execution during the time each user has the processor. After a user""s allotted time has expired, the operations system intervenes, saving the program""s state (program code and data), and then starts running the next user""s program (for the user""s set percentage of time). This process continues until, eventually, the first user has the processor again.
It takes time to save/restore the program""s state and switch from one program to another (called dispatching). This action is performed by the kernel and must execute quickly, because it is important to spend the majority of time running user programs, not switching between them. The amount of time that is spent in the system state (i.e., running the kernel and performing tasks like switching between user programs) is called the system overhead and should typically be less than 10%.
Switching between user programs in main memory is done by part of the kernel. Main system memory is divided into portions for the operating system and user programs. Kernel space is kept separate from user programs. Where there is insufficient main memory to run a program, some other program residing in main memory must be written out to a disk unit to create some free memory space. A decision is made about which program is the best candidate to swap out to disk. This process is called swapping. When the system becomes overloaded (i.e., where there are more people than the system can handle), the operating system spends most of its time shuttling programs between main memory and the disk unit, and response time degrades.
In UNIX operating systems, each user is presented with a shell. This is a program that displays the user prompt, handles user input, and displays output on the terminal. The shell program provides a mechanism for customizing each/user setup requirements, and storing this information for re-use (in a file called .profile).
When the UNIX operating system starts up, it also starts a system process (getty) which monitors the state of each terminal input line. When getty detects that a user has turned on a terminal, it presents the logon prompt; and once the password is validated, the UNIX system associates the shell program (sh) with that terminal. Each user interacts with sh, which interprets each command typed. Internal commands are handled within the shell (set, unset); external commands are invoked as programs (ls, grep, sort, ps).
Multi-tasking operating systems permit more than one program to run at once. This is done in the same way as a multi-user system, by rapidly switching the processor between the various programs. OS/2, available from IBM Corporation, One New Orchard Road, Armonk, N.Y. 10504; and Windows 95, available from Microsoft Corporation, One Microsoft Way, Redmond, Wash. 98052, are examples of multi-tasking single-user operating systems. UNIX is an example of a multi-tasking multi-user operating system. A multi-user system is also a multi-tasking system. This means that a user can run more than one program at once, using key selections to switch between them. Multi-tasking systems support foreground and background tasks. A foreground task is one the user interacts directly with using the keyboard and screen. A background task is one that runs in the background (i.e., It does not have access to the screen or keyboard.). Background tasks include operations like printing, which can be spooled for later execution.
The role of the operating system is to keep track of all the programs, allocating resources like disks, memory, and printer queues as required. To do this, it must ensure that one program does not get more than its fair share of the computer resources. The operating system does this by two methods: scheduling priority, and system semaphores. Each program is assigned a priority level. Higher priority tasks (like reading and writing to the disk) are performed more regularly. User programs may have their priority adjusted dynamically, upwards or downwards, depending upon their activity and available system resources. System semaphores are used by the operating system to control system resources. A program can be assigned a resource by getting a semaphore (via a system call to the operating system). When the resource is no longer needed, the semaphore is returned to the operating system, which can then allocate it to another program.
Disk drives and printers are serial in nature. This means that only one request can be performed at any one time. In order for more than one user to use these resources at once, the operating system manages them via queues. Each serial device is associated with a queue. When a user program wants access to the disk, for example, it sends the request to the queue associated with the disk. The operating system runs background tasks (called daemons), which monitor these queues and service requests from them. A request is then performed by this daemon process, and the results are sent back to the user""s program.
Multi-tasking systems provide a set of utilities for managing processes. In UNIX, these are ps (list processes), kill (kill a process), and and (run a process in the background). In UNIX, all user programs and application software use the system call interface to access system resources like disks, printers, memory etc. The system call interface in UNIX provides a set of system calls (C functions). The purpose of the system call interface is to provide system integrity. As all low level hardware access is under control of the operating system, this prevents a program from corrupting the system.
The operating system, upon receiving a system call, validates its authenticity or permission, then executes it on behalf of the program, after which it returns the results. If the request is invalid or not authenticated, then the operating system does not perform the request but simply returns an error code to the program. The system call is accessible as a set of xe2x80x98Cxe2x80x99 functions, as the majority of UNIX is also written in xe2x80x98Cxe2x80x99. Typical system calls are: _readxe2x80x94for reading from the disk unit; _writexe2x80x94for writing to the disk unit; _getchxe2x80x94for reading a character from a terminal; _putchxe2x80x94for writing a character to the terminal; and _ioctlxe2x80x94for controlling and setting device parameters.
The fundamental structure that the UNIX operating system uses to store information is the file. A file is a sequence of bytes, typically 8 bits long, and is equivalent to a character. UNIX keeps track of files internally by assigning each one a unique identifying number. These numbers, called inode numbers, are used only within the UNIX operating system kernel itself. While UNIX uses inode number to refer to files, it allows users to identify each file by a user-assigned name. A file name can be any sequence containing from one to fourteen characters.
There are three types of files in the UNIX file system: (1) ordinary files, which may be executable programs, text, or other types of data used as input or produced as output from some operation; (2) directory files, which contain lists of files; and (3) special files, which provide a standard method of accessing I/O devices.
UNIX provides users with a way of organizing files. Files may be grouped into directories. Internally, a directory is a file that contains the names of ordinary files and other directories, and their corresponding inode numbers. Given the name of a file, UNIX looks in the file""s directory and obtains the corresponding inode number for the file. With this inode number, UNIX can examine other internal tables to determine where the file is stored and make it accessible to the user. UNIX directories themselves have names, each of which may also contain fourteen characters.
Just as directories provide a means for users to group files, UNIX supports the grouping of directories into a hierarchical file system. At the very top of a hierarchy is a directory. It may contain the names of individual files and the names of other directories. These, in turn, may contain the names of individual files and still other directories, and so on. A hierarchy of files is the result. The UNIX file hierarchy resembles an upside-down tree, with its root at the top. The various directories branch out until they finally trace a path to the individual files, which correspond to the tree""s leaves. The UNIX file system is described as xe2x80x9ctree-structured,xe2x80x9d with a single directory. All the files that can be reached by tracing a path down through the directory hierarchy from the root directory constitute the file system.
UNIX maintains a great deal of information about the files that it manages. For each file, the file system keeps track of the file""s size, location, ownership, security, type, creation time, modification time, and access time. All of this information is maintained automatically by the file system as the files are created and used. UNIX file systems reside on mass storage devices such as disk files. These disk files may use fixed or removable type media, which may be rigid or flexible. UNIX organizes a disk as a sequence of blocks, which compose the file system. These blocks are usually either 512 or 2048 bytes long. The contents of a file are stored in one or more blocks, which may be widely scattered on the disk.
An ordinary file is addressed through the inode structure. Each inode is addressed by an index contained in an i-list. The i-list is generated based on the size of the file system, with larger file systems generally implying more files and, thus, larger i-lists. Each inode contains thirteen 4-byte disk address elements. The direct inode can contain up to ten block addresses. If the file is larger than this, then the eleventh address points to the first level indirect block. Address 12 and address 13 are used for second level and third level indirect blocks, respectively, with the indirect addressing chain before the first data block growing by one level as each new address slot in the direct inode is required.
All input and output (I/O) is done by reading the writing files, because all peripheral devices, even terminals, are files in the file system. In a most general case, before reading and writing a file, it is necessary to inform the system of your intent to do so by opening the file. In order to write to a file, it may also be necessary to create it. When a file is opened or created (by way of the xe2x80x98openxe2x80x99 or xe2x80x98createxe2x80x99 system calls), the system checks for the right to do so and, if all is well, returns a non-negative integer called a file descriptor. Whenever I/O is to be done on this file, the file descriptor is used, instead of the name, to identify the file. This open file descriptor has associated with it a file table entry kept in the xe2x80x9cprocessxe2x80x9d space of the user who has opened the file. In UNIX terminology, the term xe2x80x9cprocessxe2x80x9d is used interchangeably with a program that is being executed. The file table entry contains information about an open file, including an inode pointer for the file and the file pointer for the file, which defines the current position to be read or written in the file. All information about an open file is maintained by the system.
In conventional UNIX, all input and output is done by two system calls, xe2x80x98readxe2x80x99 and xe2x80x98write,xe2x80x99 which are accessed from programs having functions of the same name. For both system calls, the first argument is a file descriptor. The second argument is a pointer to a buffer that serves as the data source or destination. The third argument is the number of bytes to be transferred. Each xe2x80x98readxe2x80x99 or xe2x80x98writexe2x80x99 system call counts the number of bytes transferred. On reading, the number of bytes returned may be less than the number requested, because fewer than the number requested remain to be read. A return value of zero implies end of file, a return value of xe2x88x921 indicates an error of some sort. For writing, the value returned is the number of bytes actually written. An error has occurred if this is not equal to the number which was supposed to be written.
The parameters of the xe2x80x98readxe2x80x99 and xe2x80x98writexe2x80x99 system calls may be manipulated by the application program that is accessing the file. The application must, therefore, be sensitive to and take advantage of the multi-level store characteristics inherent in a standard system memory hierarchy. It is advantageous, from the application perspective, if the system memory components can be viewed as a single level hierarchy. If this is properly done, the application could dispense with most of the I/O overhead.
Faults are inevitable in digital computer systems due to such things as the complexity of the circuits and the associated electromechanical devices. To permit system operation, even after the occurrence of a fault, the art has developed a number of fault-tolerant designs. Improved fault-tolerant digital data processing systems include redundant functional units, e.g., duplicate CPUs, memories, and peripheral controllers interconnected along a common system bus. Each of a pair of functional units responds identically to input received from the bus. In the outputs, if a pair of functional units do not agree, that pair of units is taken off-line, and another pair of functional units (a xe2x80x9csparexe2x80x9d) continues to function in its place.
Even with the recent developments in fault-tolerant systems, there are characteristics of UNIX systems that make them difficult to adapt to conventional fault-tolerant operation. An important element of fault-tolerant systems is a maintenance and diagnostic system that automatically monitors the condition (or xe2x80x9cstatexe2x80x9d) of functional units of the data processing system, particularly those that are more readily replaceable (xe2x80x9cfield replaceable units,xe2x80x9d or FRUs). The complexity of UNIX based systems requires that such fault-tolerant systems maintenance and diagnostic systems (or xe2x80x9cstate machinesxe2x80x9d) have capabilities far beyond anything conventionally available. Therefore, catastrophic failure of UNIX based systems may be expected at a somewhat higher frequency than other operating systems in which fault tolerance systems are easily adapted.
Catastrophic failure is defined as any hardware problem, including but not limited to disk, planar, or adapter anomalies, which cause information about data placement or user environment to be lost to the base operating system. It is also possible, though less likely, that such failure incidents can originate within software, due to defects in coding or method of execution.
Rebuilding UNIX images, in particular, requires awareness of many system parameters, including volume groups, logical volumes, filesystems, disk allocations, network interfaces, operational services, printer definitions, software revisions, hardware specifics, and other factors. Failure to maintain awareness of these parameters may result in an inability to rebuild a system following a catastrophic failure.
To elaborate, even though root or data backups may occur on a regular basis, this is frequently not enough. It is very important to be able to recreate not just the data and user identification, but also the physical to logical mappings across disks. Furthermore, reconstruction of volume groups and other data groupings must be accomplished throughout the environment of the crashed system. Finally, the ideal innovation would present such data to the administrator for manual reconstruction and would also provide input for another tool, which could then automatically recreate all such environmental parameters on the system following a catastrophic failure.
Typically, too little thought is given to complete disaster recovery, even in the most conscientious Information Systems (IS) shops. Typically, disparate system backups are made, usually of the root volume group, and perhaps of critical data. However, this leaves many gaps in system knowledge; and following hardware failures or serious software problems, a great deal of human intervention can be required to recreate volume groups, determine logical mappings, etc. before the system can return to a fully operational state.
In today""s increasingly complex computer environment, with large disk array, shared data, and widespread user base, recreating a system environment following a catastrophic failure has become an ever more difficult task. Following a system failure, users typically restore portions of the system manually from tape (such as rootvg), recreate data structures (volume groups, logical volumes, and filesystems) by hand, and then restore data from applications manually. Furthermore, in the case of shared disk arrangements, replacement of disks into volume groups is a hit-or-miss proposition, where disks are imported into volume groups at random, such that the contents may be queried to discover the true, original arrangement.
It would be advantageous to provide system administrators with a tool to accurately recreate data structures, such as volume groups, logical volumes, and filesystems. Finally, it would further be advantageous to provide system administrators with a tool to replace disks with volume groups, without random importation, by providing contents that may be queried to discover the true original arrangement of the data. It would be even more advantageous to provide system administrators with a tool for reconstructing an entire system by allowing for the creation of a system configuration image which defines all system parameters and by then using the image or configuration parameter to automatically reconstruct the system.
The present invention relates to a system and method for acquiring valid system configuration for rebuilding UNIX images. In particular, the present invention always for the outputting of critical system parameters including volume groups, logical volumes, filesystems, disk allocations, network interfaces, operational services, printer definitions, software revisions, hardware specifics, and other factors. Initially, a configuration script is stored, either remotely or locally on a system. The configuration script is executed based on a set of predefined execution parameters, such as time based parameters, system usage, or loading parameters, or even based on the types of operation being performed on the system.
Once executed, the configuration script outputs a series of current configuration parameters to a safe storage area. These current configuration parameters define the system in terms of system hardware specifics, software specifics, and firmware specifics, including mappings from logical to physical disk drives. By recording such detailed information in a methodical form and preserving it in an accessible state, a script may be written to place every logical volume, every filesystem, and every block of data back onto its disk of origin, even if attached disk arrays stretch into terabytes. By detailing all relevant system parameters, output can be fed as input into a reconstruction script, which can then be written by anyone skilled in UNIX administration, provided the administrator has the comprehensive system environment description generated by the present invention.