The present invention relates to parallel computing, and more specifically, to systems, methods and computer program products for flexible per-task thread counts and thread binding.
Large scale applications, such as scientific applications, can run hundreds or thousands of processes (i.e., tasks) in parallel on clusters of symmetric multiprocessors (SMPs). For these large scale applications, parallel computing techniques are implemented to provide parallelism. One type of parallelism employs a message passing interface (MPI) in which multiple processes, each having its own memory, are implemented for the tasks, and data sharing and synchronization are achieved through passing messages between the tasks. Another type of parallelism is threading. A third type of parallelism, hybrid parallelism, simultaneously combines both task and thread parallelism: multiple threads can exist within each task. A thread is the smallest unit of processing that can be scheduled by an operating system, and a thread is contained within a process. Threading and multi-threading can occur on one processor having one memory. On a single processor, multithreading generally occurs by time-division multiplexing, in which the processor switches between different threads. This context switching generally happens frequently enough that the user perceives the threads or tasks as running at the same time. Threading or multi-threading can also occur over multiple processors. On a multiprocessor system, the threads or tasks actually run at the same time, with each processor or core running a particular thread or task. Each processor has access to shared memories.
The individual tasks of large scale applications are often multi-threaded with a thread count that is uniform for all tasks. The situation in which the problem cannot be easily further decomposed among tasks can restrict the scalability of the particular application. If a time-to-solution is specified, real-time constraints can fail.