1. Field of the Invention
The present invention relates to a parallel processing apparatus capable of flexibly solving at a high speed the problem of synchronization wait when a plurality of tasks are generated and a method of the same.
2. Description of the Related Art
For example, known in the art is a multiple instruction multiple datastream (MIMD) type multi-processor system in which a plurality of processor elements (PE) have independent program counters and execute the processing while mutually communicating via a common bus.
Such a multi-processor system is predicated on performing concurrent (parallel) multi-tasking and communicates between a processor element executing a main program trying to generate a task and a processor element at which a new task is generated. At this time, there are cases where the program which called up (generated) the task waits for synchronization until the called (generated) task is ended.
FIG. 15 is an overall view of the configuration of a general multi-processor system 1.
As shown in FIG. 15, the multi-processor system 1 is comprised of four processor elements PE12, PE13, PE14, and PE15 and an arbiter 16 for managing the synchronization of tasks connected via a common bus 11.
The common bus 11 acts as a control line for transferring commands and other control signals among the processor elements PE12 to PE15.
Further, in the multi-processor system 1, the processor elements PE12, PE13, PE14, and PE15 and a common or shared memory 17 are connected via a main bus 19.
The common memory 17 is connected to an external memory (main memory) via an external terminal 18.
Note that, as the configuration of the multi-processor system for realizing synchronization of multi-tasking, there are various types other than the configuration shown in FIG. 15.
For example, in the example shown in FIG. 15, a case where the synchronization of tasks is centrally managed by the arbiter 16 was shown, but it is also possible to not provide the arbiter 16 and impart a function for managing the synchronization of tasks to individual processor elements PE12 to PE15.
FIG. 16 is a view for explaining a procedure for a program generating a task (i.e. program 25) to wait for synchronization.
In the example shown in FIG. 16, the main program 25 operating on the processor element PE12 generates a task 26 on the processor element PE13.
The processor elements PE12 and PE13 operate by executing commands described by the machine language inherent to the individual processors.
It may be noted that it is also possible to generate the task and solve the synchronization even by using hardware sequential circuits.
It may also be note that, in the present specification, a case where the synchronization function is realized by commands will mainly be explained.
Turning to the problem to be solved by the invention, in the multi-processor system of the related art, it has been difficult to generate a plurality of tasks from the main program 25 shown in FIG. 16 in the exact number desired for following reasons.
Namely, the multi-processor system executes the concurrent multi-tasking, but in this multi-task method, it is necessary to allocate a plurality of programs (tasks) to a plurality of processor elements PE.
Here, with multi-tasking assuming a single processor, the most general practice is to allocate a plurality of tasks to one processor element PE by time division such as by a time sharing system (TSS). Accordingly, it is sufficient to prepare only one task management table for the one processor element PE.
In many cases where this TSS method is adopted, an operating system such as Unix (Trade Mark of MIT) having a task switching mechanism is used.
Usually, the processor element PE frequently is not provided with a synchronization command particularly conscious of multi-tasking. Rather than a synchronization command, therefore a method is often adopted in which exception handling is generated through a timer or other external interruption event and as a result the tasks are switched. Further, in order to execute the switching of the tasks at a higher speed, hardware support is frequently provided inside the processor element PE, but basically the task switching function is realized by software.
Contrary to this, in the multi-processor system, when adopting the TSS method, it becomes necessary to provide a plurality of task management tables. Further, it is necessary to prepare a program for comprehensively managing these plurality of task management tables at a higher level than the programs for managing individual processor elements PE, so the operating system becomes considerably complex. For this reason, in the multi-processor system of the related art, it has been difficult to generate exactly the desired number of tasks from the main program 25 shown in FIG. 16.
It is noted that, the operating system loaded in a multi-processor system is usually determined by the user using that multi-processor system.
There are also methods for realizing multi-tasking other than the TSS method. Applications to somewhat special purposes, for example, use of specific processor elements PE as co-processors, can be considered. Other than this, a method of permanently providing programs to be executed by co-processors, even if not fixing specific processor elements PE as co-processors, is very effective in certain fields. In any case, a mechanism for synchronization of tasks is necessary for a multi-processor system.
In the multi-processor systems in the research and prototype stage, in general, operating systems the same as that of a single processor are loaded in every processor element PE. By communicating among these processor elements PE, multi-tasking is achieved as a whole in many cases. In this case, the synchronization mechanism is used in part of the function of communication among the processor elements PE. Alternatively, a synchronization mechanism using a semaphore or other memory can be adopted.
However, in actuality, when it comes to the generation of tasks and the synchronization wait of the tasks, since in the end processing is performed by software in all cases, the response is bad, therefore this is applied at most to a case of executing rough parallel programs. Further, even in a system that can sufficiently generate a plurality of tasks, there is no decisive means for a solution to be found in these methods of waiting for tasks to end (i.e., a synchronization wait).
All combinations of conditions set, such as which task among the plurality of tasks generated from a main program is to be waited for, are possible if programming by software, but the overhead of time spent for judging these conditions becomes considerably large, so a high speed synchronization is not possible.
On the other hand, the conditions set are sometimes determined by hardware.
For example, a handshake synchronization wait system has been established between the microprocessor 8086 developed by Intel Co. of the U.S. and the coprocessor 8087 designed exclusively for that processor. When executing a command for an arithmetic operation of the main program on the processor 8086, the coprocessor 8087 automatically starts processing interpreting that command. Usually, a plurality of clock cycles has been considered necessary for the execution of an arithmetic operation. Accordingly, during this time, the processor 8086 sequentially executes the commands after that related command.
The main program contains a synchronization command after an appropriate number of commands from the task generation command. If the related arithmetic operation has been ended before the synchronization command is executed, the processor 8086 regards that the arithmetic operation is synchronized and proceeds with the execution of commands as it is. Alternatively, if the related arithmetic operation has not been ended before the synchronization command is executed, synchronization is waited for until the operation of the coprocessor 8087 is ended. This synchronization wait system uses a handshake signal based on a simple protocol and can achieve synchronization at a high speed with an extremely simple configuration.
However, it suffers from the disadvantage that a plurality of coprocessors 8087 cannot be connected to one processor 8086, so the problem of the synchronization wait when a plurality of tasks are generated cannot be solved.
An object of the present invention is to provide a parallel processing apparatus capable of flexibly solving at a high speed the problem of synchronization wait when a plurality of tasks are generated and a method of the same.
The present invention improves the overall performance of a multi-task system by improving the synchronization wait mechanism of the related art mentioned above.
In the previously described synchronization mechanism in a multi-processor system, it suffers from the disadvantage that a long time was required for recognizing the end of a task generated from a main program (hereinafter also described as a slave task). This is caused due to the multi-processor system of the related art attaching too much importance to general purpose usage and the selection of software solutions.
The present invention limits the general purpose use accompanying the generation of tasks and the synchronization of the same to a certain extent.
It is assumed that substantially freely any number of minutes are allowed for task generation. The operating states/ends of the slave tasks are automatically converted to numerical values (i.e., count values). In the main program, these numerical values are included in the execution conditions or establishment conditions of the synchronization command. Then, the numericalized operating states of the slave tasks and the synchronization command are used in combination. This makes it possible to generate a plurality of tasks. Further, for the synchronization command, the method of hardware recognition of the ends of slave tasks is adopted so as to increase the speed of the response.
Here, as a means for realizing the present invention, it is proposed that the processor element executing the main program store the number of generated slave tasks for every generation of a slave task. For this, it is also possible to simply combine a register and an adder/subtracter or merely provide a counter. Taking as an example a counter, 0 is set as an initial value and is incremented by 1 whenever a slave task is generated. Then, when a slave task is ended, after going through the proper procedure, this is notified to the processor element PE executing the main program and the previous count value is decremented by 1.
The processor element which executes the synchronization command in the main program compares the count value and the value of an argument added to the synchronization command at the time of execution of the synchronization command. If the count value is smaller or the same value as a result of comparison, it is regarded that the synchronization condition is established and the main program proceeds to the execution of commands following that synchronization command. If it is not (if the count value is larger), synchronization is waited for until the synchronization condition is satisfied.
After the execution of the synchronization command, it is also possible to initialize the value to 0 to prepare for the generation of a slave task. Alternatively, this does not have to be done. This is a matter for the user using the system.
Further, the position at which the counter is provided is not particularly limited. For example, it may be placed inside the processor element executing the main program or inside the arbiter module.
In the synchronization mechanism of the related art, when a plurality of slave tasks were generated, the method was adopted of describing all detailed synchronization conditions by software or waiting for the end of all slave tasks by hardware.
As opposed to this, in the present invention, the end of a slave task is recognized by hardware and a restriction is added to the synchronization conditions accompanying the synchronization wait command to simply set the number of the slave tasks to be synchronized. This enables a reduction of the size of the logical circuit while imparting a certain degree of flexibility to the synchronization mechanism.
That is, according to a first aspect of the present invention, there is provided a parallel processing apparatus provided with a plurality of counting means, a first processing means for executing at least one task call command including counting means designating data for designating one of the counting means, then waiting for synchronization in accordance with need by a synchronization wait command including counting means designating data and a count value satisfying the synchronization wait release conditions, at least one second processing means for executing a task called up from the first processing means and executing a task end command when the called up task has ended, wherein, each of the plurality of counting means increases the count value of a counting means indicated by the counting means designating data included in the task call command in accordance with execution of the task call command by the first processing means and decreases the count value of the counting means indicated by the counting means designating data of a task call command calling up a finished task in accordance with execution of the task end command by the second processing means; and the first processing means compares the count value included in the synchronization wait command and the count value of the counting means indicated by the counting means designating data included in the synchronization wait command and determines whether to release the synchronization wait in accordance with the result of the comparison.
Preferably, the first processing means releases the synchronization wait when the count value included in the synchronization wait command and the count value of the counting means indicated by the counting means designating data included in the synchronization wait command coincide.
More preferably, the count value included in the synchronization wait command is smaller than the number of tasks called up by a task call command including the same counting means designating data as the synchronization wait command.
Preferably, the processing in the first processing means and the processing in the at least one second processing means are performed independently from each other.
Preferably, the synchronization wait command has as arguments the counting means designating data and the count value satisfying a synchronization wait release condition.
Preferably, the first processing means and the at least one second processing means are connected through a common bus.
According to a second aspect of the present invention, there is provided a parallel processing method comprising executing, in a first processing, at least one task call command including counter designating data for designating one counter among a plurality of counters and waiting for synchronization by a synchronization wait command including counter designating data and a count value satisfying the synchronization wait release conditions; executing, in at least one second processing, a task called up from the first processing and executing a task end command when the called up task has ended; increasing the count value of a counter indicated by the counter designating data included in the task call command in accordance with execution of the task call command by the first processing and decreasing the count value of the counter indicated by the counter designating data of a task call command in accordance with execution of the task end command by the second processing; and comparing, in the first processing, the count value included in the synchronization wait command and the count value of the counter indicated by the counter designating data included in the synchronization wait command and determining whether to release the synchronization wait in accordance with the result of the comparison.
According to a third aspect of the present invention, there is provided a parallel processing apparatus comprising a first processing means for executing at least one task call command, then performing synchronization wait by a synchronization wait command in accordance with need; at least one second processing means for executing a task called up from the first processing means and executing a task end command at the time when the called up task has been ended; and a counting means for increasing a count value in accordance with execution of the task call command by the first processing means and decreasing a count value in accordance with execution of the task end command by the second processing means, the first processing means compares a count value included in the synchronization wait command and the count value of the counting means to determine whether to release the synchronization wait in accordance with the result of the comparison.
According to a fourth aspect of the present invention, there is provided a parallel processing method comprising executing, in a first processing, at least one task call command, then performing synchronization wait by a synchronization wait command in accordance with need; executing, in at least one second processing, a task called up from the first processing, and a task end command at the time when the called up task has been ended; increasing a first count value in accordance with execution of the task call command by the first processing and decreasing the first count value in accordance with execution of the task end command by the second processing; and comparing a second count value included in the synchronization wait command and the first count value and determining whether to release the synchronization wait in accordance with the result of the comparison.