1. Field of the Invention
This invention relates to the arts of on demand grid-based computing, and management and allocation of resources within a grid computing environment.
2. Description of the Related Art
In the 1990's, the communications standardization between wide ranges of systems propelled the Internet explosion. Based upon the concept of resource sharing, the latest evolutionary technology is grid computing.
Grid computing is an emerging technology that utilizes a collection of systems and resources to deliver qualities of services. It is distributed computing at its best, by creating a virtual self-managing computer, the processing for which is handled by a collection of interconnected heterogeneous systems sharing different combinations of resources. In simple terms, grid computing is about getting computers to work together, and allowing businesses, or grid participants, to optimize available resources.
The framework to grid computing is large scale resource sharing, which exist within multiple management domains, typically involving highly parrallelized applications connected together through a communications medium, and organized to perform one or more requested jobs simultaneously. Each grid resource's characteristics can include, but are not limited, to processing speed, storage capability, licensing rights, and types of applications available.
Grid computing's architecture is defined in the Open Grid Services Architecture (“OGSA”), which includes a basic specification Open Grid Services Infrastructure (“OGSI”).
Using grid computing to handle computing jobs of all sizes, and especially larger jobs such as enterprise processes, has several advantages. First, it exploits underutilized resources on the grid. For example, if a financial services company suddenly encounters a 50% increase in stock trade transactions during a 30-minute time period, using a traditional systems process, the company would face an increase in network traffic, latent response and completion time, bottleneck in processing and even overload on its resources due to its limited or fixed computational and communications resources.
In a similar situation, however, grid computing can adjust dynamically to meet the changing business needs, and respond instantly to stock transaction increase using its network of unused resources. For example, a grid computing system could run an existing stock trading application on four underutilized machines to process transactions, and deliver results four times faster than the traditional computing architecture. Thus, grid computing provides a better balance in resource utilization and enables the potential for massive parallel CPU capacity.
Second, because of its standards, grid computing enables and simplifies collaboration among many resources and organizations from a variety of vendors and operators. For instance, genome research companies can use grid computing to process, cleanse, cross-tabulate and compare massive amounts of data, with the jobs being handled by a variety of computer types, operating systems, and programming languages. By allowing the files or databases to span across many systems, data transfer rates can be improved using striping techniques that lead to faster processing giving the companies a competitive edge in the marketplace.
Third, grid computing provides sharing capabilities that extends to additional equipment, software, services, licenses and others. These virtual resources provide uniform interoperability among heterogeneous grid participants. Each grid resource may have certain features, functionalities and limitations. For example, a particular data mining job may be able to run on a DB2 server, but may not be compatible to be processed on an Oracle server. So, the grid computing architecture selects a resource which is capable of handling each specific job.
International Business Machines (“IBM”) has pioneered the definition and implementation of grid computing systems. According to the IBM architecture, Service Level Agreements (“SLAs”) are contracts which specify a set of client-driven criterion directing acceptable execution parameters for computational jobs handled by the grid. SLA parameters may consist of metrics such as execution and response time, results accuracy, job cost, and storage and network requirements. Typically, after job completion, an asynchronous process which is frequently manual is performed to compare actual completion. In other words, companies use SLAs to ensure all accounting specifics such as costs incurred and credits obtained conforms to the brokered agreements. The relationship between a submitting client and grid service provider is that of a buyer (client) and a seller (grid vendor).
In order for grid and on-demand computing to be successful, maximum automation of grid related processes needs to occur. Due to the fact that grid computing is a relatively new and emerging art, many processes have yet to be considered for automation, and as such, require inefficient manual interaction.
IBM's grid computing architecture provides an automated and efficient mechanism to allocate and enable the specific hardware and software environment required for job execution in a grid or on-demand computing system, responsive dynamically to the receipt of new jobs. However, at certain times depending on job load and job requirements within the grid, adequate resources to handle a newly submitted job may not be available. Unavailability may result from the fact that hardware and software which are capable of handling the job are already allocated to other jobs, or that no hardware and software are currently configured in the grid in a fashion which could handle the job, or combinations of both reasons.
Therefore, there is a need in the art for a mechanism which, if the current active and available grid hardware does not contain the software environment(s) required by inbound grid jobs, to build the required software environment in an automated manner. The software involved may include the base operating system, specific device drivers, application software, and other components. Build of the appropriate software environment may include complete build of a new software environment on new hardware, build of a supplement set of nodes to integrate in with other existing nodes in order to complete a required environment, or simply build of required applications on existing active nodes, according to the need in the art.