In prior art computing using separate, non-parallel processing, programs often share data and other services. An example of this is shown in FIG. 1, where separate processing memories 19a, 19b, which may by physically separated in different memory storage, or logically separated in the same memory storage, contain global variable memory 20a for data items visible to the entire process, heap memory 21a for data structure, stack memory 23a for function arguments, and local data items, and free memory space 22a, which may be utilized as needed for either heap or stack memory space. A portion of the free memory space may be designated as common memory 22c available to both program A, 24a, or program B, 24b, which operate in the separate process memories 19a, 19b, respectively. Each program A and B can access in the process memory only what is designated in the common area 22c, and cannot access other memory between the programs. A programmer utilizing the system of FIG. 1 has relatively little assistance from the system in restricting access to data structures in common memory.
Parallel processing offers improvements in that a single program can run simultaneously different threads or independent flows of control managed by the program. Multiple threads may execute in a parallel manner, and the threads may share information in either a loosely or tightly coupled manner. An example of a parallel processing arrangement is shown in FIG. 2 where a single process memory 119, having a common global memory 120 and a common heap space 121, contains a plurality of stack spaces 123a, 123b, with a single program 124 operating a plurality of threads, with one stack per program thread. The process memory structure shown can operate any number of threads 1-N, and contain any number of corresponding stacks 1-N, as shown.
Coordinated data access between threads usually requires operating system assistance (with associated penalties), such as semaphores or locks. However, in typical parallel processing applications, serialization caused by use of system services such as storage management, and coordination of access to memory often significantly reduces the attainable performance advantages of a parallel algorithm. Serialization occurs when more than one thread accesses or requests a data object or other system resource. If such a conflict occurs, only one thread has access and all other threads are denied access until the first thread is finished with the system resource. For example, the structure shown in FIG. 2 is error-prone because heap space, which contains information that is being manipulated by the program, is subject to collision as different threads attempt to access the same data structure at the same time. When this occurs, one or more threads have to wait while the data structure is accessed by another program thread.
In current practice, memory management in parallel software is also an area where complexity and inefficiency are major drawbacks. The benefits of parallel execution can be degraded, or even nullified to where sequential execution is faster, when calls are made to allocate or free memory. This is due to current serialization techniques, which must be employed to prevent collisions when two or more flows of control, i.e., threads, attempt to obtain or free memory areas. This can significantly degrade the performance of parallel programs, forcing unnatural exercises in program design and implementation. These contortions compromise maintainability, extensibility, and are a source of errors. Worse yet, the costs associated with these problems can deter developers from even considering otherwise viable parallel solutions.
In parallel programming, as described above, each thread is assigned a specific unit of work to perform, generally in parallel, and when the work is finished, the threads cease to exist. There is a cost to create a thread, terminate a thread, and to manage a thread. The cost has both machine-cycle components and programming complexity components. The programming complexity components are a source of errors in implementation and design of the software. The prevailing paradigm in the use of threads treats the threads and data differently. There is control flow (threads), and there is data. The resulting dichotomy creates an environment which tends to place fetters on the kinds of solutions envisioned, and creates complexity and resulting error-proneness during implementation.
Further, in a parallel programming environment, where units of work are appended to a regular unit of work queue of another context, and the target context is suspended on a mutex or is about to become suspended on a mutex, processing of units of work from the queue of the target context is conventionally suspended until the mutexes are released. This may not necessarily be an unwanted situation, but the behavior does waste computational time.
Thus, there remains a need in the art of computer processing for further enhancements to conventional unit of work processing techniques, for example, to enhance computational efficiency, notwithstanding that a context may be in resource contention.