1. Field of the Invention
The invention relates generally to the field of digital data processing systems, and more particularly to distributed processing systems in which the various components are interconnected by one or more networks. 2. Description of the Prior Art
A digital data processing system, or computer, typically includes a processor, associated memory and input/output units enabling a user to load programs and data into the computer and obtain processed data therefrom. In the past, computers were expensive, and so to be cost effective had to support a number of users. More recently, however, the cost of computers, particularly the processors and memories, has decreased substantially, and so it is relatively cost effective to provide a computer to one or at most only a few users.
A benefit of providing only a single computer for a large number of users was that the users could easily share information. Thus, for example, if all persons working in a bookkeeping or accounting department use a single common computer, they may maintain common accounting and bookkeeping databases up to date, and when necessary accounting reports may be generated from those databases. However, if they use separate computers, the data is stored in separate databases, on each computer, and so generating accounting reports would be more difficult.
As a result, networks were developed to provide a distributed computer system, that is, a system which permits diverse computers to communicate and transfer data among them. In addition, the networks allow the sharing of expensive input/output devices, such as printers and mass storage devices, and input/output devices which may be rarely used, such as links to the public telecommunications network. In a network, each computer is a node which communicates with other nodes over one or several wires. In addition, nodes may be provided which store and manage databases or other data files on mass storage devices, or which manage printers or links to the public telecommunications network.
As networks become larger, however, and more computers, input/output and other devices are connected to them, management of the networks becomes more difficult. To alleviate data transfer limitations when connecting too many nodes to a single network, networks have been divided into a number of smaller, essentially separate networks which are then interconnected by means of bridges or gateways to allow a node on one network to communicate with a node on another network. This alleviates the data transfer limitations of networks, but it does not alleviate management problems.
Two major problems associated with management of a computer system include backup and software management. Backup of data stored on magnetic media, such as a disk, is necessary to minimize the likelihood of data being lost. In backup, data on one node is backed up either on a different node or on another disk or on tape at the same node. Software management consists of a number of functions, including verifying that a user has a correct version, installing new versions as they are obtained, and keeping track of software distribution and use for licensing purposes. In the past, when computer systems were large, multiple user systems, a system manager performed these functions. With the advent of single user systems, such as personal computers and workstations, the users essentially became system managers, requiring them to perform these system management tasks.
In many distributed systems, it is desirable to perform some services, such as backup of individual nodes, automatically at periodic intervals. Such services may be performed, for example, at weekly or monthly time intervals. A problem arises if, because a node is not functioning, backup cannot be performed on the node when it is scheduled. In some systems, if a backup operation cannot be performed, nothing is done until the next time the backup operation is scheduled; the operation is performed at that time if the operation can be performed. In these systems, if a node becomes available at an early time in the next time period, since backup is not performed until the end of the time period, backup is not performed on that node as early as it could be. In other systems, if a backup operation cannot be performed, a backup command is inserted into a task queue, which is periodically examined and the operations in the queue are performed. In these systems, however, several commands to backup a node may be in the task queue, and so backup may be repetitively enabled after the node becomes available.