A distributed computing system divides the work required by a computing job into different assignments, which are executed on two or more processors that share the computing job. Computing jobs are often initiated by users. There is virtually no limit to the types of computing jobs that users may initiate.
Some computing jobs that are initiated by users identify a data file that is to be processed by a specific software application. For example, a user may initiate a computing job by submitting a data file for processing by a video special effects application. As another example, the user might initiate a computing job by submitting a data file to be processed by a weather prediction application.
In each of these examples, the computing job is divided between two or more processors. More particularly, separate instances of the video special effects application execute on each of the processors to share the video special effects job. Similarly, separate instances of the weather prediction application execute on each of the processors to share the weather prediction job.
Typically, a distributed computing system has a master node that assigns different portions of the overall job to the processors. Techniques exist for the computing job to be pre-divided, prior to submission to the distributed computing system. For example, a user can manually divide a video processing job into different data segments. The user can submit the different data segments to the distributed computing system as a batch of work. Upon receiving the batch of work, the master node assigns the different data segments to different processors for parallel processing. However, in general, the master node does not understand the relationship between the data segments in the batch. Therefore, while the data segments execute faster due to parallel processing, at the end of processing the user needs to manually process the individual results produced by each of the processors.
Different jobs may need to be divided in different ways. In many cases, the way in which a computing job should be divided may be dependent upon the application that is to perform the job. For example, a computing job for an application that calculates weather might be divided in a very different way than a computing job for an application that processes video data.
Unfortunately, a master node may not know an appropriate way to divide a video file into data segments for processing the different data segments on different processors. Moreover, rather than assigning different data segments to different processors, it may be more appropriate to divide the computing job into different processing tasks for each processor. For example, processing a data file can involve tasks that can be performed independent of one another. Those tasks can be assigned to different processors. However, the master node may not know how to divide the job into different processing tasks.
A master node or the like may be programmed with the knowledge of how to divide a job associated with a particular application into different assignments. However, it could be difficult to program the master node with the knowledge to divide jobs for many different types of applications. Furthermore, if a new application is to be processed in the distributed processing system, then the master node would not know the criteria for dividing computing jobs to be processed by the new application on the distributed nodes.
Therefore, a need exists for processing a computing job in a distributed processing system, wherein the job might be processed by one of many different types of applications.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.