1. Field of the Invention
The invention generally relates to assigning tasks for processing in a distributed system, and, in particular, to assigning tasks for compiling in a distributed compilation system.
2. Description of the Related Art
Distributed computing has become increasingly popular with the maturation of network technology. Oftentimes, it is desirable to exploit the processing power of various networked machines that may otherwise be idle or under utilized. For example it may be desirable to use the processing power of the networked machines to compute computationally taxing tasks, such as image processing or rendering, audio processing, video processing, encrypting, decrypting, or the like.
In a typical disturbed computing environment, a central machine on a network divides a project into a number of tasks, which are assigned to one or more of the networked machines for processing or manipulation. The results are then returned to the central machine once the processing is complete. The assignment of tasks to the machines can be based on a number of criteria, including assigning the tasks in a simplistic round-robin fashion or based on some measure of predicted or historical performance (e.g., processor speed, available memory, etc.). These methods of assigning tasks can be costly in terms of overhead, and can often produce inefficient results.
Distributed systems are also employed in the context of software development. Many software development projects suffer from slow code compilation, which can result in longer “edit, compile, test” cycles, thereby extending the amount of time it takes for developers to deploy a finished software product. Whether it is a few hours wait for a full product build, or a few minutes spent several times a day for an incremental build to finish, the persistent long delays associated with compilation can result in frustration, loss of productivity, and wasted time. To expedite the compilation process, practitioners have turned to distributed compilation systems, examples of which include TeamBuilder® and distcc. These distributed compilation systems improve compilation times by sharing the compilation processing across a group of networked machines. Like in other distributed systems, distributed compilation systems, such as distcc, employ a centrally controlled client machine, which is typically the developer's workstation or laptop. A distcc client runs on the client machine, along with command line tools such as a preprocessor, a linker, and other tools employed in the software build process. Any number of “volunteer” machines assist the client to build the program, by running the compiler and assembler as required.
In conventional distributed compilation systems, the client machine schedules tasks by assigning them to the volunteer machines. In some systems, clients may delegate tasks to the volunteer machines using a simplistic circular, round-robin scheme. In other systems, a client gathers information about the operational capabilities (e.g., processor speed, availability) of the various volunteer machines, and then assigns the tasks to the volunteer machines based on the operational capability of the volunteer machine. Each of these schemes, however, has its drawbacks.
A round-robin scheme is not particularly efficient for delegating tasks because of the potential mismatch between the amount of work load that is assigned to a particular volunteer machine and its processing capabilities. For example, based on a round-robin scheme, a client machine may delegate a task to a slower, less capable volunteer machine instead of another faster volunteer machine, simply because the slower machine is next in line to receive the task. Similarly, the client machine may routinely delegate a task to a volunteer machine that is presently overloaded over an under-utilized volunteer machine based simply on the relative positions of the two volunteer machines in the round-robin scheme.
Like the round-robin schemes, schemes in which the client first gathers information about the various volunteer machines before work is assigned also tend to be inefficient and inflexible. This is because the client machine (or another machine that is designated to gather the information) is constantly burdened with the responsibility of ascertaining the operational capabilities of the various volunteer machines on the network and then ensuring that these operational capabilities are up-to-date. Constantly maintaining an up-to-date list of the various volunteer machines can be inefficient, particularly if some of those volunteer machines are rarely or never utilized. Thus, there is a need to efficiently delegate tasks in distributed compilation systems.
The present invention is directed to overcoming, or at least reducing, the effects of, one or more of the problems set forth above.