The present invention relates to the processing of processes executed in parallel.
Some software packages or computer programs take much time to execute or accomplish a given task. To be more efficient and reduce the computation times, these programs can take advantage of the parallel nature of the computer on which they are executed. “Parallel nature of a computer” is understood to mean a computer on which several processors, or at least a processor with several cores, or at least a processor with several threads are mounted.
To make the most of the parallel nature, a computer program divides its task (or main task) into several sub-tasks, whose calculations can be made in parallel by various processes. Therefore, the purpose of each process will be to execute and accomplish one of these sub-tasks. Once a process has ended its current sub-task, it will be possible to assign a second sub-task to be accomplished to it after which a following sub-task will be possibly assigned to it and so forth.
The use of a multitude of processes (multiprocess processing) entails a synchronization need of the latter. In particular, the purpose of this synchronization is to enable a well-ordered re-organization of the main task when the sub-tasks have been accomplished.
Such a synchronization is generally ensured by a mechanism, called “inter-process synchronization mechanism”. This mechanism must be fast in order not to cancel the temporal advantage drawn from the use of processes executed in parallel.
To perform the aforementioned synchronization, a software nature mechanism called “barrier mechanism” is known. This mechanism can be based on various algorithms which follow the same main scheme described hereafter.
Firstly, a computer program intended to accomplish a task is executed via n processes, themselves being capable of executing a set of sub-tasks. Each sub-task is divided into successive blocks intended to accomplish work steps, such as an intermediate computation, for example. So, the blocks or intermediate computations of the various processes are executed in parallel. Each process having completed a block waits at the level of a barrier (synchronization barrier) until all the other parallel blocks of the other processes are completed and have joined the barrier in turn. Only once all the processes have reached the barrier are the following blocks executed during the next work step. This principle is described hereafter using a temporal diagram.
FIG. 1 shows a barrier mechanism and thus the general operation of a synchronization barrier. Starting from a main task T, a process manager PM will firstly break down the task T, into n sub-tasks ST. These n sub-tasks ST will be executed by n processes P. In other words, the complex main task T is broken down into several simple sub-tasks ST, each of these sub-tasks being accomplished by a separate process.
The results obtained of the various sub-tasks ST executed by the processes P will be collected in the end to accomplish the main task T.
Let us notice that the concept of process manager PM is to be understood in the general sense. Hence, the manager PM is not necessarily an element of its own. Indeed, the process manager can generally be seen as the capacity of a computer program to implement a passive or active breakdown method to enable the processes to share the sub-tasks out between themselves. The capacity can be implicit, determined by one of the processes, or correspond to a breakdown predefined by a user.
As mentioned above, during the breakdown of a task into a multitude of processes Pj, there is a synchronization need in the parallel execution of these various processes. For this purpose, the n processes are themselves divided into blocks B, which must be executed successively in time. The sub-set of blocks B which are being executed at the same time (and originate from various processes P) constitutes a work step W. Consequently, each set of blocks B of the same rank i constitutes a distinct work step W.
The blocks Bi of the work step of rank i, noted Wi are executed in parallel. The time t for execution of blocks B originating from various processes Pj is variable. To ensure the synchronization mentioned above, the blocks B are subjected to a synchronization barrier BS (100). This barrier BS (100) is called by each process P when it has finished executing its block Bi in progress. The synchronization barrier BS (100) authorizes changeover to the block Bi+1 of the next rank, only when all the blocks Bi in progress have “joined” the barrier, i.e. informed it that their execution is completed.
The first completed block B, i.e. that with the shortest execution time t, informs by request the synchronization barrier BS (100) that it has finished its work, on the one hand, and of the number of remaining blocks in progress during the same work step, on the other hand. Generally, the number of blocks during a work step is equivalent to the number n of processes P.
The synchronization barriers are usually fitted with a counter. The counter is initialized when the first block B has joined the barrier. Subsequently, the counter is decremented whenever another block B joins the barrier BS (100). So, the barrier BS (100) can follow the progress (or advance) of a work step, and more precisely the termination of each block B in progress. When the last block B, namely that with the highest execution time t has joined the barrier BS (100), the latter informs each process P and authorizes them to transit to the next work step W. Again, this following work step W consists of blocks B executed in parallel and originating from the various processes P. During this following work step, the mechanism of the barrier BS (100) is analogous to the preceding one. This is repeated for each work step, and continues until the processes P terminate. The task T will then be accomplished by restoring the results of the processes P.
Such algorithms require a number of interactions between the processes, blocks and barrier. These interactions will be described later in the detailed description and comprise the barrier initialization, the information given to the barrier when a block has finished its work, the verification that all the sub-processes have terminated their current block, in particular. These interactions, when they are managed by barriers of a software nature, are relatively slow and greatly consume passband.
FIG. 2 relating to the prior art represents an implementation of a known synchronization barrier BS (100). The known mechanisms are implemented in software packages. So, the data defining the synchronization barrier BS (100) are stored in the RAM (202) (Random Access Memory) of a computer (or other computer device) and the various processes P have access (by read/write R/W) to this RAM (202) to interact with said barrier BS (100). This access is done by means of an address space and an ADR address (detailed further on). The access comprises, such as described above, the initialization of the barrier BS (100) (with the counter initialization), informing the barrier BS (100) whenever a block B has finished its work during the same work step W, check whether all the processes P have finished their block B of the work step W in progress, etc. The program intended to carry out these functions is also active in random access memory, in particular by calling a function library.
An address space can be segmented into independent segments. “Segment” is understood to mean in general a memory segment defined by two values:                the address where this segment begins (basic address), and        the segment size.        
Therefore, a segment constitutes a continuous address range in a main memory (physical or virtual).
FIG. 2 shows a computer device comprising several processors PZ1 to PZy (200), a memory access manager CACHE COHER MGR (206), a RAM (202) containing a program area in which the synchronization barrier BS (100) of a software nature is. The device according to FIG. 2 therefore comprises a processing unit capable of multi-process processing operations. The processes will then be executed on various processors, on various cores of processors, and/or on various threads. The processing unit gives to these processors what is called an “address space”, in particular to the random access memory, where the code and the data which define the software synchronization barrier BS (100) are, in an area associated with a precise ADR address, which can be the area beginning address. The device of FIG. 2 further comprises a process manager (208) of the type as defined above to break down a task T inton processes P, themselves divided into successive B blocks.
The barriers of the prior art (FIG. 2) enable the implementation of a synchronization between various processes P. But, as already mentioned, the software nature of a barrier makes it slow with respect to some needs. Indeed, whenever a process P interacts with it, a functions library of the barrier BS (100) is used. In addition, within the library, many interactions with the memory are necessary to read and write the barrier updating data until the detection of the fact that all the processes have reached the meeting place (“synchronization barrier). Then, once the process P has informed the barrier BS (100), the process P must regularly interrogate the barrier BS (100) to see if the other current blocks B have finished their work.
All this, and particularly the numerous interactions mentioned above, entails that the synchronization barriers BS (100) of a software nature are slow and consume passband. The effect of this are losses of clock cycles, which is all the more regrettable that the multi-process mode is used to go faster.
In addition, it may happen that various blocks belonging to distinct respective processes inform the barrier at the same time; whence memory access conflicts generating additional latency and passband problems (conflict management by CACHE COHER MGR).
The present invention improves the situation.