This invention relates generally to multi-processor systems, and more particularly to providing an efficient, scalable, user-friendly framework for parallel execution of jobs in computationally intensive processing environments.
The advent of multi-core CPUs comprising two or more execution cores on a single die (chip) that execute multiple processing threads (including processes, kernel-space or user-space threads) simultaneously in parallel has increased the per-socket processing throughput of microprocessors, but poses a new challenge to the software industry, i.e., how to effectively use multi-threading for computationally intensive problems with minimal synchronization overhead. Multi-processor systems prove very efficient when a workload contains long-running and independent work units (jobs). For example, on web servers, each incoming request is independent of others so it can be scheduled to execute on a separate core without interacting with threads running on other cores (for static web content). However, many other more complex and demanding workloads involve jobs with intricate inter-dependencies. A job may involve side computations, for example, to build or retrieve required input data and/or produce an output for other jobs. Thus, a “parent” job may spawn one or more other dependent “child” jobs (children) that must complete before the parent job itself completes. While multi-processor systems advantageously enable jobs to be separated and executed simultaneously in parallel in separate processing threads, the jobs must be synchronized and their execution coordinated because of their dependencies. This is particularly so for solving computationally intensive problems.
Job dependencies have been traditionally resolved using synchronization primitives such as mutexes (processing locks) and event signaling, in which a parent job waits for its children jobs to notify the parent job to indicate that they have completed before the parent job resuming its processing. Threads may also notify each other that there is available work to pick up. However, processing locks and signaling require operating system (OS) involvement, and, as such, are too expensive. They are too costly for use in synchronizing and scheduling short-running jobs, e.g., less than 10,000 CPU cycles per job, and far too inefficient for optimal multi-core, multi-threaded processing of more complex jobs.
What is needed are job scheduling and synchronization approaches for use with multi-processor systems that afford an efficient framework that enables jobs to be suspended when spawning children and to be resumed when the children complete, while avoiding the use of locks. Additionally, for optimum processing, the framework should identify common tasks (jobs) that are semantically equivalent and required for multiple purposes so that they may be executed once instead of multiple times to avoid wasting resources. Moreover, multi-threaded programs are notoriously difficult to develop, program and debug, particularly for a complex workflow. Accordingly, the scheduling framework should desirably be simple, intuitive to use, and preferably hide the intricacies of parallel programming from the application developer.
It is desirable to provide systems and methods that address the foregoing and other problems of scheduling computational jobs to processing threads running on multi-processor systems and that achieve the above objectives. It is to these ends that the present invention is directed.