This invention relates, in general, to concurrent data processing and, in particular, to managing a set of concurrently executable interdependent computer processes.
Concurrent processing is instrumental in significantly reducing the time it takes to perform a particular job or function, and it is an important aspect in today""s computing environments. However, there has been some reluctance, especially in critical business situations, to accept and use concurrent processing. This reluctance is caused by the potential for corrupt data.
For example, in parallel processing environments, data is transferred between processes and modified along the way (e.g., along the pipeline, which is defined as a set of processes together with the data paths between the processes). Errors in one process may affect the integrity of the data that is produced (e.g., output) or consumed (e.g., accepted) by another process at a later point in time. Since the processes handle data in parallel, they may not know of an error that occurred in a particular stage of execution of another process. Thus, there is a potential for data corruption.
A specific example of when data corruption can arise is when processes within a pipeline start and finish at widely varying times. In this scenario, at the time of error, some processes in the pipeline may be just starting, some may be in the midst of processing, and others may have finished pipeline processing and be in termination phases. These processes may all be handling related data in parallel, so errors in processes in any phase (e.g., initiation, process or termination) may affect other processes, regardless of the phase of execution. If a process is not aware of the error, data corruption can occur.
As another example, processes may continue with non-pipeline related processing after disconnecting from or closing a connection to the pipeline (i.e., closing a connection to a resource (e.g., a pipe) of the pipeline). However, these processes still may be affected by errors that occur during pipeline related processing of other processes. Thus, if these processes are not aware of the errors, their data may be corrupted.
Previously, processes that terminated would not know of errors that occurred during execution of other related processes. This caused potential data corruption, thereby, forestalling the widespread use of concurrent processing. Therefore, a need exists for a facility to ensure data integrity. In particular, a need exists for a facility that will notify each related process, whether terminated or not, of an error that occurred with another related process. A further need exists for a mechanism to build a group of related processes, and to be able to dynamically change the group of processes. A yet further need exists for providing synchronization at termination, such that all of the processes of the related group learn of any errors. A further need exists for a mechanism that allows synchronization at initiation, so that processes will not complete before others have begun, thus, having a potential for corrupt data.
The shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for managing concurrently executable processes. A plurality of interdependent processes are registered as a topology of one or more logically dependent execution groups, and completion of the plurality of interdependent processes is synchronized. The plurality of interdependent processes is prevented from completing termination until the interdependent processes terminate successfully or until a condition relating to one of the interdependent processes is detected. As one example, the condition is an abnormal termination.
In another aspect of the present invention, a system for managing concurrently executable processes is provided. The system includes means for registering a plurality of interdependent processes as a topology of one or more logically dependent execution groups, and means for synchronizing completion of the interdependent processes. Completion is synchronized such that complete termination of the plurality of interdependent processes is prevented until all of the processes successfully terminate or until a condition relating to one of the processes is detected.
The management facility of the present invention advantageously enables the grouping of interdependent processes and the management of that group of processes. In accordance with the principles of the present invention, no process within the group of processes completely terminates successfully until all of the other processes within the group terminates normally. If an abnormal termination occurs within the group of processes, then since none of the processes has completely terminated, the processes of the group are notified of the abnormal condition, and in one example, are not permitted to terminate successfully. In one embodiment of the invention, either all of the processes within a group terminate successfully or they all fail. This ensures the integrity of the data processed by the group of processes.
In addition to the above, the management facility of the present invention advantageously allows the group of processes to be dynamically changed, so that all of the interdependent processes can be assured of data integrity.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.