1. Field of the Invention
This invention relates to network computing, and more particularly to grid computing systems.
2. Description of the Related Art
Grid computing enables organizations to use their distributed computing resources more efficiently and flexibly, providing more usable power out of existing systems—and helping organizations gain a competitive business advantage. Conceptually, a grid is quite simple: it is a collection of computing resources connected through a network. Grid middleware aggregates these resources—e.g. servers, storage, databases, and scientific instruments—and provides transparent, remote, and secure access to computing power wherever and whenever it is needed. Grid computing aggregates resources and delivers computing power to every user in the network. A compute grid may include distributed compute resources including one or more of, but not limited to: desktop, server, and High Performance Computing (HPC) systems. Grid computing may provide benefits not available with traditional computing models including one or more of, but not limited to: better utilization of resources, increased user productivity, scalability, and flexibility.
The simplest form of a grid, a Cluster Grid, consists of multiple systems interconnected through a network. Cluster Grids may contain distributed workstations and servers, as well as centralized resources in a data center environment. Typically owned and used by a single project or department, Cluster Grids support both high throughput and high performance jobs. Common examples of the Cluster Grid architecture include compute farms, groups of multi-processor HPC systems, Beowulf clusters, and networks of workstations (NOW).
Cluster Grids typically employ a standard three-tier system architecture, as shown in FIG. 1 (prior art). The architecture includes front-end access nodes, middle-tier management nodes, and back-end compute nodes. The access tier provides access and authentication services to the Cluster Grid users. The Management Tier is the middle tier and includes one or more servers that run the server elements of client-server software such as Distributed Resource Management (DRM), hardware diagnosis software, and system performance monitors. The size and number of servers in this tier may vary depending on the type and level of services to be provided. For small implementations with limited functionality, a single node can be chosen to host all management services for ease of administration. Alternatively, these functions may be provided by multiple servers for greater scalability and flexibility. The Compute Tier supplies the compute power for the Cluster Grid. Jobs submitted through upper tiers in the architecture are scheduled to run on one or more nodes in the compute tier. Nodes in this tier run the client-side of the DRM software, the daemons associated with message-passing environments, and any agents for system health monitoring. The compute tier communicates with the management tier, receiving jobs to run, and reporting job completion status and accounting details.
FIG. 2 illustrates an exemplary prior art grid farm. A grid farm may include one or more compute (or execution) nodes 104 and a master node 100. A job submitter (access) node 106 submits jobs to a master node 100. The master node 100 dispatches the jobs to various compute nodes 104. Compute nodes 104 crunch the numbers and return results back to the master node 100, which in turn provides the results to the job submitter node 106. In a conventional grid farm, the master node 100 and compute nodes 104 are configured manually.
Conventional grids are monolithic, with one master node and multiple compute nodes. In conventional grid architecture, the master node may be a bottleneck for performance and a single point of failure. There is just one master node, and under some conditions, the master node may not be able to dispatch jobs quickly and efficiently.
Sun's Cluster Grid Architecture
Grid computing systems typically use a traditional model where a Grid Farm has a static view of the network. Sun's Cluster Grid implementation is an exemplary grid computing system that may be used for computation-intensive jobs. The following individual components may be included in Sun's Cluster Grid architecture:                Sun Grid Engine software        Development Tools and Run Time Libraries (e.g., Sun HPC ClusterTools™, Forte™ for HPC)        Technical Computing Portal software (e.g., Sun™ ONE Portal Server)        System Management Tools (e.g., Sun™ Management Center, SunVTS™, and Solaris JumpStart™ and Web Start Flash)        Underlying platform (e.g., Solaris Operating Environment, Sun servers, and Sun StorEdge storage products).        
Sun Grid Engine software is a distributed management product that optimizes utilization of software and hardware resources. Sun Grid Engine finds a pool of idle resources and harnesses it productively, so an organization gets as much as five to ten times the usable power out of systems on the network. Sun Grid Engine software aggregates available compute resources and delivers compute power as a network service.
Peer-to-Peer Computing
Peer-to-peer (P2P) computing, embodied by applications like Napster, Gnutella, and Freenet, has offered a compelling and intuitive way for Internet users to find and share resources directly with each other, often without requiring a central authority or server. The term peer-to-peer networking or computing (often referred to as P2P) may be applied to a wide range of technologies that greatly increase the utilization of information, bandwidth, and computing resources in the Internet. Frequently, these P2P technologies adopt a network-based computing style that neither excludes nor inherently depends on centralized control points. Apart from improving the performance of information discovery, content delivery, and information processing, such a style also can enhance the overall reliability and fault-tolerance of computing systems.
JXTA
Sun's JXTA is an exemplary peer-to-peer platform. Peer-to-peer platforms such as JXTA may provide protocols for building networking applications that thrive in dynamic environments. JXTA technology is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a peer-to-peer (P2P) manner. JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports. In JXTA, every peer is identified by an ID, unique over time and space. Peer groups are user-defined collections of entities (peers) that may share a common interest. Peer groups are also identified by unique IDs. Peers may belong to multiple peer groups, discover other entities and peer resources (e.g. peers, peer groups, services, content, etc.) dynamically, and publish themselves and resources so that other peers can discover them.