The present invention relates generally to distributed computing systems, and specifically to management of shared resources in such systems.
Distributed computing systems, in which multiple computers share in the use of system resources, such as disk storage, are known in the art. Shared disk file systems are software components that enable multiple computers in a distributed computing system to access one or more disks at the same time, while sharing in the tasks of managing file system metadata relating to these disks. U.S. Pat. No. 5,940,838, whose disclosure is incorporated herein by reference, and the above-mentioned U.S. patent application Ser. No. 08/893,644 describe in detail both shared disk file systems generally and a particular implementation of such a system that can be used advantageously in conjunction with the present invention. In this implementation, all of the computers have independent access to all of the disks, subject to a global locking mechanism that prevents any of the computers from attempting to access a region of a disk that is currently controlled by another computer in a conflicting mode. A shared disk file system of this type is the IBM General Parallel File System (GPFS) for the RS/6000 SP computer system.
When one of the computers, referred to hereinafter as a node, requires allocation of additional disk space for storage, it must first find a region with sufficient free space for the data to be stored. Every time one of the nodes allocates disk space for its storage requirements or de-allocates unneeded disk space, there is a change in the total amount and distribution of free disk space in the system, referred to hereinafter as the xe2x80x9cfree informationxe2x80x9d of the system. In order to choose the appropriate region from which to take its allocation of disk space, and to be assured that there will be sufficient storage available for its current needs, the node must have an updated view of the free information. There are a number of ways in which the node can acquire this free information:
The node can collect the free information from all of the other nodes. This method, however, entails severe performance penalties, since it will lead to frequent disruptions of all of the disk allocation work going on throughout the system and will create communication bottlenecks.
Whenever any node allocates or de-allocates disk space, it can send a message to all of the other nodes that have indicated an interest in this information. Each node then keeps its own records of free information through the system. This method will also lead to excessive communications traffic and waste of storage space due to duplication of the free information.
The communications burden associated with either of these methods will grow quadratically as the number of nodes in the system is scaled up.
The above-mentioned U.S. Pat. No. 5,940,838 suggests an alternative solution, in which an allocation manager component keeps loose track of which node (if any) is using each allocation region and approximately how much free space remains in each region. During initialization of the file system, the allocation manager examines each region to count the number of free blocks in each, and keeps this information in a table. Before switching to a new region, a node sends a message to the allocation manager to obtain a suggested region to which it should switch. At appropriate times, the node sends notification to the allocation manager of the node""s activities in all storage regions on which it has acted since the preceding notification, indicating the present amount of free space in the regions. These activities typically include allocation/deallocation and gain or loss of xe2x80x9cownership.xe2x80x9d The allocation manager updates its table to indicate the free space in these regions and to show which regions are now owned by each node.
It is an object of the present invention to provide improved methods and systems for management of shared resources in a distributed computing environment.
It is a further object of some aspects of the present invention to provide efficient methods for collecting and distributing information regarding availability of resources in such an environment, and particularly regarding free disk space.
In preferred embodiments of the present invention, a distributed computing system comprises a plurality of computing nodes, which have a common file system and share system resources. The system resources preferably include data storage, most preferably in the form of multiple shared disks. One of the nodes is chosen to be a coordinating node, for the purpose of gathering, maintaining and distributing free resource information (xe2x80x9cfree informationxe2x80x9d), including particularly the approximate amount of data storage space that is free throughout the system. The coordinating node makes this information available to all of the other nodes, so that applications running on any of the nodes can determine how much of the storage space or other resource is free at any given time, by means of a single query. Preferably, the coordinating node also performs other functions of the allocation manager described in U.S. Pat. No. 5,940,838.
Upon start-up of the system, the coordinating node acquires initial free information from allocation maps of all of the shared disks. Whenever one of the nodes gains control of a given allocation map region, it extracts the free information for that region from the allocation map and then updates the information based on its own allocation and de-allocation of disk space in the region. Periodically, the nodes send messages to inform the coordinating node of changes in the free information regarding allocation map regions under their control. The coordinating node uses these messages to update its picture of the free information in the entire system and notify the other nodes of this information.
Thus, whenever one of the nodes needs to ascertain the amount of free storage space in the system, it simply refers to the latest free status update that it has received from the coordinating node. To this information, the node preferably adds the results of its own allocation activity since the time of the update. The free information maintained and provided by the coordinating node is approximate, since the other nodes send their update messages only periodically. In large systems, however, allocation activity generally goes on more or less continuously, and there is typically only a minor loss in efficiency of disk space allocation due to the inaccuracy in the instantaneous free information provided by the coordinating node. On the other hand, the use of the coordinating node in this manner, to gather and distribute the approximate free information, greatly reduces the overhead associated with coordinating resource allocation in the system. The communications burden of collecting and distributing free information in this manner scales only linearly with the number of nodes in the system.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for managing a shared resource that is allocated among nodes in a distributed computing system, the method including:
receiving periodic reports from the nodes regarding their respective allocations of the resource; and
responsive to the periodic reports, determining an approximate amount of the resource that is free for further allocation.
Preferably, the shared resource includes data storage, and determining the approximate amount of the resource that is free includes determining an approximate amount of free storage capacity. Most preferably, the data storage includes a plurality of disks linked to the nodes by a network, which disks are commonly accessible to multiple ones of the nodes, and access by the nodes to the disks is controlled using a shared disk file system.
Preferably, receiving the periodic reports includes receiving information from the nodes based on an allocation map of data storage regions respectively controlled by the nodes. In a preferred embodiment, determining the approximate amount of the resource that is free includes finding, responsive to the periodic reports, one or more regions having sufficient free storage space to meet a storage need of one of the nodes, and the method further includes advising the node of the one or more regions.
Further preferably, determining the approximate amount of the resource that is free includes determining the number of free storage blocks in the regions respectively controlled by the nodes. Most preferably, determining the approximate amount of the resource that is free includes determining an exact initial number of the free storage blocks, and then updating the number approximately responsive to the periodic reports.
Preferably, receiving the periodic reports includes receiving the reports at predetermined intervals. Alternatively or additionally, receiving the periodic reports includes receiving the reports at intervals that vary from one node to another in the system. Most preferably, receiving the reports at the intervals that vary includes receiving the reports from at least one of the nodes with a frequency responsive to a measure of allocation activity by the at least one of the nodes.
In a preferred embodiment, receiving the periodic reports includes receiving a report with a timestamp, and determining the approximate amount of the resource that is free includes ignoring a report having an outdated timestamp.
Preferably, determining the approximate amount of the resource that is free includes compiling free resource information from the periodic reports to calculate a total amount of the resource that is free.
Further preferably, receiving the periodic reports includes selecting one of the nodes to serve as a coordinating node, which receives the reports from the other nodes and determines the approximate amount of the resource that is free, wherein the coordinating node reports the approximate amount of the resource that is free to the other nodes.
There is also provided, in accordance with a preferred embodiment of the present invention, apparatus for managing a shared resource that is allocated among nodes in a distributed computing system, including a processor, which is configured to communicate with the nodes in the distributed computing system so as to receive periodic reports from the nodes regarding their respective allocations of the resource, and responsive to the periodic reports, to determine an approximate amount of the resource that is free for further allocation.
There is further provided, in accordance with a preferred embodiment of the present invention, a distributed computing system, including:
a plurality of processors, configured to serve as nodes of the system;
a communication network, linking the processors; and
a shared resource, accessible by the nodes via the network,
wherein one of the nodes is selected to act as a coordinating node to which the other nodes periodically report on their respective allocations of the resource, and
wherein the coordinating node is adapted, responsive to the reported allocations, to determine an approximate amount of the resource that is free for further allocation.
There is also provided, in accordance with a preferred embodiment of the present invention, a computer software product, including a computer-readable medium in which program instructions are stored, which instructions, when read by a computer in a distributed computing system in which a shared resource is allocated among multiple nodes, cause the computer to receive periodic reports from the nodes regarding their respective allocations of the resource, and responsive to the periodic reports, to determine an approximate amount of the resource that is free for further allocation.
In a preferred embodiment, the computer program instructions are run by the computer in conjunction with a shared disk file system, which controls access by the nodes to the disks.