A parallel application (also referred to task parallelism or function parallelism) is a form of parallelization of computer code across multiple processors in parallel computing systems. Task parallelism focuses on distributing execution processes (tasks or threads) across different parallel computing nodes. Scheduling techniques are used to schedule computer jobs in a parallel computing system so that the resources of the environment are efficiently utilized.
Traditionally, resource bookkeeping is buried at the lowest levels of the job scheduling logic, making it difficult and time consuming to extend existing job scheduling algorithms with novel paradigms, such as backfill and preemption. Resource bookkeeping is the tracking of used, free, bad, and to-be-used resources in the job scheduling algorithm. With current job scheduling algorithms, which allow a large variety of scheduling options, such as scheduling by hostlist, blocking, packing, etc., trying to extend the existing algorithms to support new, moderately complex scheduling paradigms, and at the same time maintain correctness of the current options, often requires substantial re-coding modifications to most of the underlying options. Typically, most of the currently supported scheduling options must also be supported by the new paradigms. As a result, introduction of new paradigms has a substantial impact on the existing code base. Development and testing cycles along with product quality are thus greatly effected.
One prior method used across multiple processors in a parallel computing system is a callback mechanism in the device drive (kernel space). The callback mechanism is implemented on a per thread/resource basis and was not portable. Therefore migrating from AIX to Linux requires extensive re-coding. Further this callback method is prone to timing errors. In order to properly handle these timing errors, the driver has to be recoded to provide stable and reliable preemption support. This delays development support for user space application preemption until driver can be recoded. The need to create customized code can be expensive and time consuming.
Resource scheduling can also be further complicated if the hardware in the parallel computing systems hardware in which the resource scheduler manages and/or the software for the resource scheduler changes. Again, preempting tasks running on each OS today requires customized programs that communicate with the scheduler. Development time, costs, and product quality are hence greatly impacted.
Therefore a need exists to overcome the problems with the prior art as discussed above.