The present invention relates to the distribution and execution of computationally intensive services over a plurality of networked computer systems. More particularly, the invention relates to a system which dynamically registers network resources, and tracks their availability to execute a requested service (job) by evenly distributing (balancing) the job's load for execution across available networked resources.
Computationally intensive services have traditionally been executed on mainframe systems. Some systems have recently been developed that allow for distributing the load of computationally intensive jobs (services) over many networked computer systems. As a result, conventional distributed computing systems, such as banks or networks of workstations, servers or mirrored sites (“hosts”) on the World Wide Web (WWW) must address the problem of using their distributed system resources effectively. For example, in conventional systems where some hosts (system resources or servers) lie idle while others are heavily loaded, system performance can decrease significantly. To prevent such unbalanced and inefficient performance and use of system resources, attempts have been made to distribute a service-execution load throughout the conventional distributed computer system's networked resources.
However, distributing a load for execution across networked resources can raise load balancing issues. Load balancing includes distributing processing and communication activity evenly across a computer network so that no single device is overused. Load balancing may improve performance measures such as the expected time a task stays in the system.
For example, very few enterprises can afford to host their company's web site on a single, monolithic server. Rather, sites are deployed on server clusters that improve performance and scalability. To provide fault tolerance and hide cluster detail from site visitors, a load balancing appliance sits between the INTERNET and the server cluster, acting as a virtual server.
As each new client request arrives, the load balancing appliance makes near instantaneous intelligent decisions about the physical server best able to satisfy each incoming request. Load balancing optimizes request distribution based on factors like capacity, availability, response time, current load, historical performance and administrative weights. A well-tuned adaptive load balancer ensures that customer sites are available 24×7 with the best possible response time and resource utilization.
Although determining an effective load balancing strategy depends largely on the details of the underlying system, general models from both queuing theory and computer science often provide valuable insight and general rules of thumb. For example, in a system comprising N servers, incoming tasks must be assigned a server and wait for service. Logic would have it that if an incoming task knows exactly the current number of tasks already queued at each server, then the task can be instantaneously queued. It is usually best for the task to be routed to the server with the shortest queue if the tasks are of about equal size. However, there are many different algorithms for Load Balancing. In many actual systems, however, it is unrealistic to assume that tasks will have access to up-to-date load information. While global information may be updated only periodically, the time delay for a task to be moved to a server may still be so long that the load information is out of date by the time the task arrives.
Other than operation within conventional systems in which up-to-date load information is available, the strategy of going to the shortest queue can lead to disastrous results when load information is out of date by the time the task arrives. Consequently, certain resources may be over utilized while others are under utilized. But even in conventional systems where load balancing issues are addressed, computational resources are not dynamically available. Hence, some means of handling variable (dynamic) availability of resources, such as computational resources, is needed to render the execution system much more scalable and robust.
To that end, U.S. Pat. No. 5,991,808, provides a system for servicing a job, such as by use of a plurality of processing units or hosts. In a multiprocessor system, the processing units could all be part of a single computer device, such as a high power work station with multiple processors. Alternatively each of the processing units could be a single computer, when linked together form a bank or network of servers or workstations each of which include a single processor.
A task directing unit, which may be a client processor in a distributed computing environment such as a network, the Web or part of a multiprocessor computer device itself, is interconnectable to each of the plurality of system resources. The task directing unit is configured to first obtain load information representing a loading of each of a number of resources selected uniformly at random (u.a.r.). Preferably, the task directing unit simultaneously queries each of the randomly selected resources for load information. Each randomly selected resource is configured to respond to the query with load information representing its loading.
While the '808 patent addresses load balancing issues in this manner, it does not address dynamic availability of computational resources. As stated, during system operation, the task directing unit must query each known resource for its scheduling. That is, new resources are not dynamically available to facilitate expeditious and balanced distribution of tasks over available system resources.