Distributed or grid computing provides the ability to share and allocate processing requests and resources among various nodes, computers or server farm(s) within a grid. A server farm is generally a group of networked servers or, alternatively, a networked multi-processor computing environment, in which work is distributed between multiple processors. Workload is distributed between individual components or processors of servers. Networked servers of a grid can be geographically dispersed. Grid computing can be confined to a network of computer workstations within a company or it can be a public collaboration.
Resources that are distributed throughout the grid include various objects. An object is a self-contained module of data and associated processing that resides in a process space. There can be one object per process or tens of thousands of objects per process.
A server farm environment can include different classes of resources, machine types and architectures, operating systems, storage and hardware. Server farms are typically coupled with a layer of load-balancing or distributed resource management (DRM) software to perform numerous tasks, such as managing and tracking processing demand, selecting machines on which to run a given task or process, and scheduling tasks for execution.
Current distributed computing systems, however, can only handle homogeneous systems for distribution of chore. In other words, they cannot handle multiple heterogeneous paths without custom modification.
Further, there is a need in the industry to be able to talk to multiple vendors of dispatch systems, both commercial and non-commercial, and to talk to multiple dispatch systems at the same time. That is, to provide a product that has the flexibility of working in an existing-environment and migrating to other environments that are constantly changing. What is needed is a system that is able to submit to multiple distributors, and pick the appropriate distributor for the chore that is going to be run. There is also a need for a system that can handle exception cases, such as a case when a distributor was selected because it was available at that time, but by the time that the system distributed the chore for distribution, the selected distributor could no longer handle the chore.
There is also a need for a method and apparatus to prevent deadlock in single threaded servers of a distributed computing system. Single threaded servers can easily become deadlocked by the failure to accurately match requests with replies when replies are received out of sequence. That is, in single threaded systems, replies to requests must be sequentially matched with their original requests. If a reply comes back to the system out of sequence (it does not match the most recent request), the server will be unable to handle the reply, thus deadlocking the system.
More specifically, in systems that do not use multi-threaded servers, multiple messages can be received, and sometimes, when someone requests a message, it will create additional messages that need to be sent out to other objects. These messages have to be sent out and the results received before the response to the original request can be processed. As a result, there can be deadlock situations. Thus, there a need exists for a method and apparatus to prevent deadlock in single threaded servers so that a computer can be both a client and a server in the same process thread.
Additionally, in modern distributed networks, individual tasks are bundled together into chores for distribution and/or dispatch onto the grid. However, the current process for bundling tasks together into chores is generally performed based upon a dispatch policy that typically does not account for available resources. Current systems do not optimize tasks that are contained in a chore to match the available resources. Accordingly, there is a need for a system and method that can optimize tasks that are contained in a chore to match available resources.
Further, customers need a method of defining a task with resource requirements, independent of the distributor and/or system on which the task will execute. Thus, a need exists for a method and apparatus to map homogeneous task resource definitions to a dynamically changing set of distributor resource pools of different types.
Additionally, in current systems, if any task within a chore fails, then the entire chore can fail. That is, if a chore fails half-way through the lists of tasks it's executing, then all of the tasks in that chore would fail. Thus, what is needed is a system that can retry tasks within a chore if a partial failure of a chore occurs that has multiple tasks within it.
Embodiments fulfill these unmet needs.