The present invention relates generally to control and management of a dynamic distributed environment of autonomous cooperating agents, and, more particularly, to control and management of resources in a grid computing environment.
Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer. At its core, grid computing is based on an open set of standards and protocols such as the Open Grid Services Architecture (OGSA), www.globus.org, and the Web Services Resource Framework (WS-RF), www.webservices.org, both of which are incorporated herein by reference. These standards enable communication across heterogeneous, geographically dispersed environments. With grid computing, organizations can optimize computing and data resources, pool them for large capacity workloads, and share them across networks for enabling collaboration. Further information regarding the Open Grid Services Architecture (OGSA), and grid computing in general, may be found in the publication entitled, “The Physiology of the Grid”, Ian Foster, Argonne National Laboratory & University of Chicago, Jul. 20, 2002, www.globus.org/research/papers/osga.pdf, the contents of which are incorporated herein by reference in their entirety.
A basic premise of OGSA and WS-RF is that everything may be represented by a service or may be accessed and managed through services (i.e., a network enabled entity that provides some capability through the exchange of messages). Computational resources, storage resources, networks, programs and databases are all examples of such services. More specifically, OGSA represents everything as a Grid service (i.e., a Web service that conforms to a set of conventions and supports standard interfaces for such purposes as lifetime management). This core set of consistent interfaces, from which all Grid services are implemented, facilitates the construction of higher order services that can be treated in a uniform way across layers of abstraction.
There are two common models currently used for control and management of a collective of independent entities, namely, the “centralized” model and the “hierarchical” model. In the centralized model, a central authority directly controls all the entities within the collective. Such a model is only feasible, however, if the size of the collective is limited. On the other hand, in the hierarchical model, the flow of control is mapped into a tree structure, wherein inner tree nodes have the responsibility of controlling their immediate children. In other words, each inner node directly controls only a limited number of entities (e.g., other inner nodes or leaf nodes). Although this model is more flexible in terms of the size of the collective, there are at least two limitations associated therewith.
First, the failure of an inner node immediately disconnects the sub-tree controlled by the failed inner node from the rest of the collective. Second, the hierarchical model is most efficient in a static environment, where all of the entities are known “a priori” and a balanced tree may be designed and implemented. However, in a dynamic environment (where entities constantly join and leave the collective), the maintenance of a balanced tree becomes more difficult. For example, some nodes will be forced to control an increasingly larger number of other entities, and eventually reaching a point where it becomes necessary to stop the operation of the collective and re-architect the hierarchical structure.
Accordingly, it would be desirable to be able to implement a management structure that provides a scalable and resilient mechanism for propagating control information throughout a collective, such as a computing grid or an ad-hoc network of mobile nodes, for example.